214
P1.T2. Quantitative Analysis Bionic Turtle FRM Practice Questions Reading 13 Michael Miller, Mathematics and Statistics for Financial Risk Management, 2 nd Edition Probabilities (Chapter 2) Basic Statistics (Chapter 3) Distributions (Chapter 4) Bayesian Analysis (Chapter 6) Hypothesis Testing and Confidence Intervals (Chapter 7) This is a super-collection of quantitative practice questions. It represents several years of cumulative history mapped to the current Reading 10 (Miller’s Mathematics and Statistics for Financial Risk Management) By David Harper, CFA FRM CIPM www.bionicturtle.com

R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

Embed Size (px)

DESCRIPTION

FRM part 1 Miller practice questions

Citation preview

Page 1: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

P1.T2. Quantitative Analysis Bionic Turtle FRM Practice Questions Reading 13 Michael Miller, Mathematics and Statistics for Financial Risk Management, 2nd Edition

Probabilities (Chapter 2)

Basic Statistics (Chapter 3)

Distributions (Chapter 4)

Bayesian Analysis (Chapter 6)

Hypothesis Testing and Confidence Intervals (Chapter 7)

This is a super-collection of quantitative practice questions. It represents several years of cumulative history mapped to the current Reading 10 (Miller’s Mathematics and Statistics for Financial Risk Management) By David Harper, CFA FRM CIPM www.bionicturtle.com

Page 2: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

2

This question set contains not only recently written questions, but also an accumulation of prior questions. The reason we did not delete many of the prior questions is simple: although the FRM’s econometrics readings have churned in recent years (specifically, for Probabilities and Statistics, from Gujarati to Stock and Watson to Miller), the learning outcomes (AIMs) have remain essentially unchanged. Further, the testable concepts are generally quite durable over time. For this reason, for example, the Gujarati Q&A is highly relevant. Nevertheless, within an assigned readings, our practice questions are sequenced in reverse chronological order (the most recent questions appear first). For example, in regard to assigned Miller Chapter 3 (Statistics), you will notice there are fully three (3) sets of questions:

Miller Chapter 3 (T2.303 to .308)

Stock & Watson (T2.208 to .213): written in 2012. Optional, but relevant

Gujarati (T2.57 to .82): written prior to 2012. Optional, but relevant Therefore, do not feel obligated to review all of the questions in this document! Rather, consider the additional questions as merely a supplemental, optional resource for those who will to spend additional time with the concepts. A fine strategy is to merely review the most recent questions (“Miller”) within each Chapter. The major sections are:

Probabilities (assigned is Miller Chapter 2) o Most recent BT questions, Miller Chapter 2 (T2.300 to .302) o Previous BT questions, Stock & Watson Chapter 2 (T2.201 to .207)

Statistics (assigned is Miller Chapter 3) o Most recent BT questions, Miller Chapter 3 (T2.303 to .308) o Previous BT questions, Stock & Watson Chapter 3 (T2.208 to .213) o Previous BT questions, Gujarati (T2.57 to .82)

Distributions (assigned is Miller Chapter 4) o Most recent BT questions, Miller Chapter 4 (T2.309 to .312) o Previous BT questions, Rachev Chapters 2 & 3 (T2.110 to .126)

Bayesian Analysis (assigned is Miller Chapter 6) o Most recent BT questions, Miller Chapter 6 (T2.500 to T2.501) o Previous BT questions, Miller Chapter 2 (T2.302)

Hypothesis Testing & Confidence Intervals (assigned is Miller Chapter 7) o Most recent BT questions, Miller Chapter 5 (T2.313 – .315)

Appendix o Annotated Gujarati (encompassing, highly relevant)

Page 3: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

3

PROBABILITIES - KEY IDEAS ................................................................................................. 6

PROBABILITIES (MILLER CHAPTER 2) .................................................................................. 8 P1.T2.300. PROBABILITY FUNCTIONS (MILLER) ......................................................................... 8 P1.T2.301. MILLER'S PROBABILITY MATRIX ...............................................................................11

PROBABILITIES (STOCK & WATSON CHAPTER 2) .............................................................15 P1.T2.201. RANDOM VARIABLES .............................................................................................15 P1.T2.202. VARIANCE OF SUM OF RANDOM VARIABLES .............................................................18 P1.T2.203. SKEW AND KURTOSIS (STOCK & WATSON) .............................................................22 P1.T2.204. JOINT, MARGINAL, AND CONDITIONAL PROBABILITY FUNCTIONS ................................24 P1.T2.205. SAMPLING DISTRIBUTIONS (STOCK & WATSON) ......................................................26 P1.T2.206. VARIANCE OF SAMPLE AVERAGE ............................................................................28 P1.T2.207. LAW OF LARGE NUMBERS AND CENTRAL LIMIT THEOREM (CLT) ...............................30

STATISTICS - KEY IDEAS .......................................................................................................33

STATISTICS (MILLER, CHAPTER 3) ......................................................................................34 P1.T2.303 MEAN AND VARIANCE OF CONTINUOUS PROBABILITY DENSITY FUNCTIONS (PDF) .........34 P1.T2.304. COVARIANCE (MILLER) ..........................................................................................37 P1.T2.305. MINIMUM VARIANCE HEDGE (MILLER) .....................................................................41 P1.T2.306. CALCULATE THE MEAN AND VARIANCE OF SUMS OF VARIABLES. ...............................44 P1.T2.307. SKEW AND KURTOSIS (MILLER) ..............................................................................47 P1.T2.308. COSKEWNESS AND COKURTOSIS ............................................................................49

STATISTICS (STOCK & WATSON CHAPTER 3) ....................................................................53 P1.T2.208. SAMPLE MEAN ESTIMATORS (STOCK & WATSON) ....................................................53 P1.T2.209. T-STATISTIC AND CONFIDENCE INTERVAL ................................................................56 P1.T2.210. HYPOTHESIS TESTING (STOCK & WATSON) ............................................................58 P1.T2.211. TYPE I AND II ERRORS AND P-VALUE (STOCK & WATSON) ........................................60 P1.T2.212. DIFFERENCE BETWEEN TWO MEANS .......................................................................62 P1.T2.213. SAMPLE VARIANCE, COVARIANCE AND CORRELATION (STOCK & WATSON) ................64

STATISTICS (GUJARATI’S ESSENTIALS OF ECONOMETRICS) .........................................66 P1.T2.57. METHODOLOGY OF ECONOMETRICS .........................................................................67 P1.T2.58. RANDOM VARIABLES ...............................................................................................70 P1.T2.59. GUJARATI’S INTRODUCTION TO PROBABILITIES ..........................................................72 P1.T2.60. BAYES THEOREM ....................................................................................................75 P1.T2.61. STATISTICAL DEPENDENCE ......................................................................................77 P1.T2.62. EXPECTATION & VARIANCE OF VARIABLE ..................................................................80 P1.T2.64. COVARIANCE OF RANDOM VARIABLES .......................................................................82 P1.T2.65. VARIANCE AND CONDITIONAL EXPECTATIONS ............................................................84 P1.T2.66. SKEW & KURTOSIS .................................................................................................86 P1.T2.67. SAMPLE VARIANCE, COVARIANCE, SKEW, KURTOSIS ..................................................89 P1.T2.68. NORMAL DISTRIBUTION ............................................................................................92 P1.T2.69. SAMPLING DISTRIBUTION .........................................................................................94 P1.T2.70. STANDARD ERROR ..................................................................................................96 P1.T2.71. CENTRAL LIMIT THEOREM (CLT) ..............................................................................98 P1.T2.72. STUDENT’S T DISTRIBUTION ................................................................................... 100 P1.T2.73. CHI-SQUARE DISTRIBUTION .................................................................................... 102 P1.T2.74. F-DISTRIBUTION .................................................................................................... 104 P1.T2.75. CONFIDENCE INTERVAL ......................................................................................... 106 P1.T2.76. CRITICAL T-VALUES ............................................................................................... 108 P1.T2.77. CONFIDENCE INTERVAL ......................................................................................... 110

Page 4: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

4

P1.T2.78. ESTIMATOR PROPERTIES ....................................................................................... 112 P1.T2.79. HYPOTHESIS TESTING ........................................................................................... 114 P1.T2.80. CONFIDENCE INTERVALS ....................................................................................... 116 P1.T2.81. TYPE I, TYPE II & P VALUE ..................................................................................... 118 P1.T2.82. CHI-SQUARE AND F-RATIO ..................................................................................... 120

DISTRIBUTIONS (MILLER CHAPTER 4) ............................................................................... 122 P1.T2.309. PROBABILITY DISTRIBUTIONS I, MILLER CHAPTER 4 .............................................. 122 P1.T2.310. PROBABILITY DISTRIBUTIONS II, MILLER CHAPTER 4 ............................................. 124 P1.T2.311. PROBABILITY DISTRIBUTIONS III, MILLER .............................................................. 126 P1.T2.312. MIXTURE DISTRIBUTIONS ..................................................................................... 129

DISTRIBUTIONS (RACHEV CHAPTERS 2 & 3) .................................................................... 131 P1.T2.110. RACHEV’S DISTRIBUTIONS ................................................................................... 131 P1.T2.111. BINOMIAL & POISSON .......................................................................................... 134 P1.T2.112. RACHEV’S PROPERTIES OF NORMAL DISTRIBUTION ................................................ 137 P1.T2.113. RACHEV’S EXPONENTIAL ..................................................................................... 139 P1.T2.114. WEIBULL DISTRIBUTION ....................................................................................... 141 P1.T2.115. GAMMA DISTRIBUTION (RACHEV) ......................................................................... 143 P1.T2.116. BETA DISTRIBUTION (RACHEV) ............................................................................. 145 P1.T2.117. CHI-SQUARE DISTRIBUTION .................................................................................. 147 P1.T2.118. STUDENT’S T DISTRIBUTION ................................................................................. 149 P1.T2.119. LOGNORMAL DISTRIBUTION .................................................................................. 151 P1.T2.120. LOGISTIC DISTRIBUTION ....................................................................................... 153 P1.T2.121. EXTREME VALUE DISTRIBUTIONS .......................................................................... 155 P1.T2.122. STABLE DISTRIBUTIONS ....................................................................................... 157 P1.T2.123. HAZARD RATE OF EXPONENTIAL VARIABLE ............................................................ 159 P1.T2.124. EXPONENTIAL VERSUS POISSON .......................................................................... 161 P1.T2.125. GENERALIZED PARETO DISTRIBUTION (GPD) ........................................................ 163 P1.T2.126. MIXTURES OF DISTRIBUTIONS .............................................................................. 165

BAYESIAN ANALYSIS (MILLER CHAPTER 6) ..................................................................... 167 P1.T2.500. BAYES THEOREM (MILLER CHAPTER 6) ................................................................ 167 P1.T2.501. MORE BAYES THEOREM (MILLER CHAPTER 6) ...................................................... 169 P1.T2.302. BAYES' THEOREM (MILLER) (OLDER SET OF QUESTIONS)....................................... 172

HYPOTHESIS TESTING & CONFIDENCE INTERVALS (MILLER CHAPTER 7) .................. 176 P1.T2.313. MILLER'S HYPOTHESIS TESTING .......................................................................... 176 P1.T2.314. MILLER'S ONE- AND TWO-TAILED HYPOTHESES ...................................................... 178 P1.T2.315. MILLER'S HYPOTHESIS TESTS, CONTINUED ............................................................ 180

APPENDIX: MORE GUJARATI ECONOMETRICS ................................................................ 182

GUJARATI: ESSENTIALS OF ECONOMETRICS, 3RD EDITION ......................................... 182

CHAPTERS 1-5 ...................................................................................................................... 182 GUJARATI.02.12: .................................................................................................................. 182 GUJARATI.02.13: .................................................................................................................. 183 GUJARATI.03.08: .................................................................................................................. 184 GUJARATI.03.09: .................................................................................................................. 185 GUJARATI.03.10: .................................................................................................................. 186 GUJARATI.03.17: .................................................................................................................. 187 GUJARATI.03.21: .................................................................................................................. 188 GUJARATI.04.01: .................................................................................................................. 189

Page 5: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

5

GUJARATI.04.03: .................................................................................................................. 191 GUJARATI.04.04: .................................................................................................................. 192 GUJARATI.04.06: .................................................................................................................. 193 GUJARATI.04.11: .................................................................................................................. 193 GUJARATI.04.15: .................................................................................................................. 194 GUJARATI.04.17: .................................................................................................................. 195 GUJARATI.04.18: .................................................................................................................. 197 GUJARATI.04.20: .................................................................................................................. 198 GUJARATI.05.01: .................................................................................................................. 199 GUJARATI.05.02: .................................................................................................................. 201 GUJARATI.05.03: .................................................................................................................. 203 GUJARATI.05.04: .................................................................................................................. 205 GUJARATI.05.09: .................................................................................................................. 207 GUJARATI.05.10: .................................................................................................................. 208 GUJARATI.05.13: .................................................................................................................. 209 GUJARATI.05.14: .................................................................................................................. 210 GUJARATI.05.17: .................................................................................................................. 211 GUJARATI.05.18: .................................................................................................................. 212 GUJARATI.05.19: .................................................................................................................. 213 GUJARATI.05.20: .................................................................................................................. 214

Page 6: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

6

Probabilities - Key Ideas

Risk measurement is largely the quantification of uncertainty. We quantify uncertainty by characterizing outcomes with random variables. Random variables have distributions which are either discrete or continuous.

In general, we observe samples; and use them to make inferences about a population (in practice, we tend to assume the population exists but it not available to us)

We are concerned with the first four moments of a distribution:

o Mean

o Variance, the square of the standard deviation. Annualized standard deviation is called volatility; e.g., 12% volatility per annum.

o Skew (a function of the third moment about the mean): a symmetrical distribution has zero skew or skewness

o Kurtosis (a function of the fourth moment about the mean).

The normal distribution has kurtosis = 3.0

As excess kurtosis = 3 – kurtosis, a normal distribution has zero excess kurtosis

Kurtosis > 3.0 refers to a heavy-tailed distribution (a.k.a., leptokurtosis), which will also tend to have a higher peaked.

The concepts of joint, conditional and marginal probability are important.

To test a hypothesis about a sample mean (i.e., is the true population mean different than some value), we use a student t or normal distribution

o Student t if the population variance is unknown (it usually is unknown)

o If the sample is large, the student t remains applicable, but as it approximates the normal, for large samples the normal is used since the difference is not material

To test a hypothesis about a sample variance, we use the chi-squared

To test a joint hypothesis about regression coefficients, we use the F distribution

In regard to the normal distribution:

o N(mu, σ^2) indicates the only two parameters required. For example,

N(3,10) connotes a normal distribution with mean of 3 and variance of 10 and, therefore, standard deviation of SQRT(10)

o The standard normal distribution is N(0,1) and therefore requires no parameter specification: by definition it has mean of zero and variance of 1.0.

o Please memorize, with respect to the standard normal distribution:

For N(0,1) Pr(Z < -2.33) ~= 1.0% (CDF is one-tailed)

For N(0,1) Pr (Z< -1.645)~ = 5.0% (CDF is one-tailed)

Page 7: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

7

The definition of a random sample is technical: the draws (or trials) are independent and identically distributed (i.i.d.)

o Identical: same distribution

o Independence: no correlation (in a time series, no autocorrelation)

The assumption of i.i.d. is a precondition for:

o Law of large numbers

o Central limit theorem (CLT)

o Square root rule (SRR) for scaling volatility; e.g., we typically scales a daily volatility of (V) to an annual volatility with V*SQRT(250). Please note that i.i.d. returns is the unrealistic precondition.

Page 8: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

8

Probabilities (Miller Chapter 2) P1.T2.300. Probability functions (Miller) P1.T2.301. Miller's probability matrix

P1.T2.300. Probability functions (Miller)

AIMs: Describe the concept of probability. Describe and distinguish between continuous and discrete random variables. Define and distinguish between the probability density function, the cumulative distribution function and the inverse cumulative distribution function, and calculate probabilities based on each of these functions.

300.1. Assume the probability density function (pdf) of a zero-coupon bond with a notional value of $10.00 is given by f(x) = x/8 - 0.75 on the domain [6,10] where x is the price of the bond:

What is the probability that the price of the bond is between $8.00 and $9.00?

a) 25.750% b) 28.300% c) 31.250% d) 44.667%

300.2. Assume the probability density function (pdf) of a zero-coupon bond with a notional value of $5.00 is given by f(x) = (3/125)*x^2 on the domain [0,5] where x is the price of the bond:

Although the mean of this distribution is $3.75, assume the expected final payoff is a return of the full par of $5.00. If we apply the inverse cumulative distribution function and find the price of the bond (i.e., the value of x) such that 5.0% of the distribution is less than or equal to (x), let this price be represented by q(0.05); in other words, a 5% quantile function. If the 95.0% VaR is given by -[q(0.05) - 5] or [5 - q(0.05)], which is nearest to this 95.0% VaR?

a) $1.379 b) $2.842 c) $2.704 d) $3.158

Page 9: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

9

300.3. Assume a loss severity given by (x) can be characterized by a probability density function (pdf) on the domain [1, e^5]. For example, the minimum loss severity = $1 and the maximum possible loss severity = exp(5) ~= $148.41. The pdf is given by f(x) = c/x as follows:

What is the 95.0% value at risk (VaR); i.e., given that losses are expressed in positive values, at what loss severity value (x) is only 5.0% of the distribution greater than (x)?

a) $54.42 b) $97.26 c) $115.58 d) $139.04

Page 10: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

10

Answers: 300.1. C. 31.250% The anti-derivative is F(X) = x^2/16 - 0.75*x + c. We can confirm it is a probability by evaluating it on the domain [x = 6, x = 10] = 10^2/16 - 0.75*10 - 6^2/16 - 0.75*6 = -1.25 - (-2.25) = 1.0. Probability [8 <= x <= 9] = [9^2/16 - 0.75*9] - [8^2/16 - 0.75*8] = -1.68750 - (-2.000) = 31.250% 300.2. D. $3.158 As f(x) = 3/125*x^2, F(x) = 3/125*(1/3)*x^3 = p, such that: p = F(x) = (3/125)*(1/3)*x^3 = x^3/125, solving for x: x = (125*p)^(1/3) = 5*p^(1/3). For p = 5%, x = 5*5%^(1/3) = $1.8420. As q(0.05) = $1.8420, 95% VaR = $5.00 - $1.8420 = $3.1580 300.3. C. $115.58 We need d/dx [ln(x)] = 1/x; see http://en.wikipedia.org/wiki/Natural_logarithm#The_natural_logarithm_in_integration If f(x) = c/x, then anti-derivative F'(x) = c*ln(x) + a; It must be the case that, under a probability function, F'(e^5) = 1.0 such that 1.0 = c*ln(e^5) = c*5, and therefore c = 1/5. Put another way, under the definition of a probability:

55 5

11 1

1 ln( )1.0 ( ) 1.0

5 5

ee ec x

dx F x dxx x

As F'(x) = p = ln(x)/5, now solving for x: p = ln(x)/5, 5p = ln(x), and taking exp() of both sides: exp(5p) = x, such that for the 95% quantile function: exp(5*0.95%) = $115.58 Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-300-probability-functions-miller.6728/

Page 11: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

11

P1.T2.301. Miller's probability matrix

AIMs: Calculate the probability of an event given a discrete probability function. Distinguish between independent and mutually exclusive events. Define joint probability, describe a probability matrix and calculate joint probabilities using probability matrices.

301.1. A random variable is given by the discrete probability function f(x) = P[X = x(i)] = a*X^3 such that x(i) is a member of {1, 2, 3} and (a) is a constant. That is, X has only three discrete outcomes. What is the probability that X will be greater than its mean? (bonus: what is the distribution's variance?)

a) 45.8% b) 50.0% c) 62.3% d) 75.0%

301.2. A credit asset has a principal value of $6.0 with probability of default (PD) of 3.0% and a loss given default (LGD) characterized by the following continuous probability density function (pdf): f(x) = x/18 such that 0 ≤ x ≤ $6. Let expected loss (EL) = E[PD*LGD]. If PD and LGD are independent, what is the asset's expected loss? (note: why does independence matter?)

a) $0.120 b) $0.282 c) $0.606 d) $1.125

Page 12: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

12

301.3. In analyzing a company, Analyst Sam prepared a probability matrix which is a joint (aka, bivariate) probability mass function that characterizes two discrete variables, equity performance versus a benchmark (over or under) and bond rating change. The company's equity performance will result in one of three mutually exclusive outcomes: under-perform, track the benchmark, or over-perform. The company's bond will either be upgraded, downgraded, or remain unchanged. Unfortunately, before Sam could share his probability matrix, he spilled coffee on it, and unfortunately some cells are not visible.

Two questions: what is the joint Prob [equity over-performs, bond has no change]; and are the two discrete variables independent?

a) 7.0%, yes b) 12.0%, yes c) 19.0%, no d) 22.0%, no

Page 13: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

13

Answers: 301.1. D. 75.0% Because it is a probability function, a*1^3 + a*2^3 + a*3^3 = 1.0; i.e., 1a + 8a + 27a = 1.0, such that a = 1/36. Mean = 1*(1/36) + 2*(8/36) + 3*(27/36) = 2.722. The P [X > 2.2722] = P[X = 3] = (1/36)*3^3 = 27/36 = 75.0% Bonus: Variance = (1 -2.722)^2*(1/36) + (2 -2.722)^2*(8/36) + (3 -2.722)^2*(27/36) = 0.2562, with standard deviation = SQRT(0.2562) = 0.506135 301.2. A. $0.120 If PD and LGD are not independent, then E[PD*LGD] <> E(PD) * E(LGD); for example, if they are positively correlated, then E[PD*LGD] > E(PD) * E(LGD). For the E[LGD], we integrate the pdf: if f(x) = x/18 s.t. 0 < x < $6, then F'(x) = (1/18)*(1/2)*x^2 = x^2/36 (note this satisfied the definition of a probability over the domain (0,6) as 6^2/36 = 1.0). The mean of f(x) integrates xf(x) where xf(x) = x*x/18 = x^2/18, which integrates to 1/18*(x^3/3) = x^3/54, so E[LGD] = 6^3/54 = $4.0.

Therefore, the expected loss = E[PD * LGD] = 3.0%*$4.0 = $0.120.

Page 14: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

14

301.3. C. 19.0%, no Joint Prob[under-perform, upgrade] = 4%, such that marginal (aka, unconditional) Prob[upgrade] = 4% + 8% + 11% = 23%. The marginal (unconditional) Prob[no change] = 100% - 23% - 13% = 64%, and therefore: Joint Prob[over-perform, no change] = 64% - 15% - 30% = 19.0%. The variables are independent if and only if (iif) the joint probability is equal to the product of marginal pmfs (pdfs); In this case, joint Prob[over-perform, no change] = 19.0% but the product of marginals = 32%*64% = 20.48%; i.e., 19% <> Prob[over-perform]*Prob[no change] Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-301-millers-probability-matrix.6757/

Page 15: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

15

Probabilities (Stock & Watson Chapter 2) P1.T2.201. Random variables P1.T2.202. Variance of sum of random variables P1.T2.203. Skew and kurtosis (Stock & Watson) P1.T2.204. Joint, marginal, and conditional probability functions P1.T2.205. Sampling distributions (Stock & Watson) P1.T2.206. Variance of sample average P1.T2.207. Law of large numbers and Central Limit Theorem (CLT)

P1.T2.201. Random variables

Define random variables, and distinguish between continuous and discrete random variables. Define the probability of an event. Define, calculate, and interpret the mean, standard deviation, and variance of a random variable.

201.1. Which of the following is most likely to be characterized by a DISCRETE random variable, and consequently, a discrete probability distribution (aka, probability mass function, PMF) and/or a discrete CDF?

a) The future price of a stock under the lognormal assumption (geometric Brownian motion, GBM) that underlies the Black-Scholes-Merton (BSM)

b) The extreme loss tail under extreme value theory (EVT; i.e., GEV or GPD) c) The empirical losses under the simple historical simulation (HS) approach to value at

risk (VaR) d) The sampling distribution of the sample variance

201.2. A model of the frequency of losses (L) per day, for a certain key operational process, assumes the following discrete distribution: zero loss (events per day) with probability (p) = 20%; one loss with p = 30%; two losses with p = 30%; three losses with p = 10%; and four losses with p = 10%. What are, respectively, the expected (average) number of loss events per day, E(L), and the standard deviation of the number of loss events per day, StdDev(L)?

a) E(L) = 1.20 and StdDev(L) = 1.44 b) E(L) = 1.60 and StdDev(L) = 1.20 c) E(L) = 1.80 and StdDev(L) = 2.33 d) E(L) = 2.20 and StdDev(L) = 9.60

Page 16: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

16

201.3. A volatile portfolio produced the following daily returns over the prior five days (in percentage terms, %, for convenience): +5.0, -3.0, +6.0, -1.0, +3.0. Although this is a tiny sample, we have two ways to calculate the daily volatility. The first is to compute a technically proper daily volatility as an unbiased sample standard deviation. The second, a common practice for short-period/daily returns, is to make two simplifying assumptions: assume the mean return is zero since these are daily periods, and divide the sum of squared returns by (n) rather than (n-1). For this sample of only five daily returns, what is respectively (i) the sample daily volatility and (ii) the simplified daily volatility?

a) 1.65 (sample) and 2.55 (simplified) b) 2.96 (sample) and 3.00 (simplified) c) 4.11 (sample) and 3.65 (simplified) d) 3.87 (sample) and 4.00 (simplified)

201.4. Consider the following five random variables:

A standard normal random variable; no parameters needed.

A student's t distribution with 10 degrees of freedom; df = 10.

A Bernoulli variable that characterizes the probability of default (PD), where PD = 4%; p = 0.040

A Poisson distribution that characterizes the frequency of operational losses during the day, where lambda = 5.0

A binomial variable that characterizes the number of defaults in a basket credit default swap (CDS) of 50 bonds, each with PD = 2%; n = 50, p = 2%

Which of the above has, respectively, the lowest value and highest value as its variance among the set?

a) Standard normal (lowest) and Bernoulli (highest) b) Binomial (lowest) and Student's t (highest) c) Bernoulli (lowest) and Poisson (highest) d) Poisson (lowest) and Binomial (highest)

Page 17: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

17

Answers: 201.1. C. The empirical losses under the simple historical simulation (HS) approach to value at risk (VaR). Historical simulation sorts actual losses (e.g., daily) which informs an empirical and discrete distribution. Put another way, note that identifying the VaR is basically an exercise in identifying the quantile based on a "counting-type" distribution of losses; e.g., -100, -98, -97, .... -40. Another view is that in a discrete distribution the p(X = x) = f(x); contrast with a continuous, where P(X = x) = dxf(x). In a simple historical simulation of 100 losses, the probability of the worst loss, or any loss, is 1/100 = 1.0% = f(x). (note: per Dowd, there are kernel methods to effectively transform a discrete empirical into a continuous pdf, but this question says "simple" HS!)

In regard to (A), lognormal is continuous.

In regard to (B), EVT approaches are parametric continuous.

In regard to (D), the sampling distribution of the sample variance is characterized by the continuous chi-squared distribution; i.e., we use chi-square to test the significance of a sample variance.

201.2. B. E(L) = 1.60 and StdDev(L) = 1.20 E(L) = 20%*0 + 30%*1 + 30%*2 + 10%*3 + 10%*4 = 1.6; Variance(L) = (0 - 1.6)^2*20% + (1 - 1.6)^2*30% + (2 - 1.6)^2*30% + (3 - 1.6)^2*10% + (4 - 1.6)^2*10% = 1.44; Standard deviation (L) = SQRT(1.44) = 1.20.

Please note: as we are given ex ante probabilities, and not an empirical sample, there is no application of sample variance concept here; i.e., as this is not a sample and our variance is not an estimate (the value produced by an estimator), we do not need to divide the sum of squared differences by (n-1).

201.3. D. 3.87 (sample) and 4.00 (simplified) The average return = +2; The sum of squared differences = (5-2)^2 + (-3-2)^2 + (6-2)^2 + (-1 - 2)^2 + (3-2)^2 = 60. The sample variance = 60/(n-1) = 15, such that the sample standard deviation = SQRT(15) = 3.8730. The simplified standard deviation = (5^2 + -3^2 + 6^2 + -1^2 + 3^2)/5 = 4.0 While assuming that the mean = 0 is a simplifying assumption, the division by n=5 rather than n=4 is to merely rely on a different but valid estimator (MLE rather than unbiased). 201.4. C. Bernoulli (lowest) and Poisson (highest) In order:

Bernoulli has variance = p(1-p) = 4%*96 = 0.0384

Binomial has variance = p(1-p)n = 2%*98%*50 = 0.980

Standard normal has, by definition, mean = 0 and variance = 1.0

Student's t has variance = df/(df-2) = 10/8 = 1.25

Poisson has lambda = variance = mean = 5 <-- easy to remember, yes?!

Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-201-random-variables.4951/

Page 18: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

18

P1.T2.202. Variance of sum of random variables

AIM: Calculate the mean and variance of sums of random variables.

202.1. A high growth stock has a daily return volatility of 1.60%. The returns are positively autocorrelated such that the correlation between consecutive daily returns is +0.30. What is the two-day volatility of the stock?

a) 1.800% b) 2.263% c) 2.580% d) 3.200%

202.2. A three-bond portfolio contains three par $100 junk bonds with respective default probabilities of 4%, 8% and 12%. Each bond either defaults or repays in full (three Bernoulli variables). The bonds are independent; their default correlation is zero. Finally, for convenience, recovery is assumed to be zero (LGD = 100%). What is, respectively, the mean value of the three-bond portfolio and the standard deviation of the portfolio's value?

a) mean $276.00 and StdDev $46.65 b) mean $276.00 and StdDev $139.94 c) mean $276.00 and StdDev $2,176.45 d) mean $313.00 and StdDev $94.25

202.3. Assume two random variables X and Y. The variance of Y = 49 and the correlation between X and Y is 0.50. If the variance[2X - 4Y] = 652, which is a solution for the standard deviation of X?

a) 2.0 b) 3.0 c) 6.0 d) 9.0

202.4 A risky bond has a (Bernoulli) probability of default (PD) of 7.0% with loss given default (LGD) of 60.0%. The LGD has a standard deviation of 40.0%. The correlation between LGD and PD is 0.50. What is the bond's expected loss, E[L] = E[PD * LGD]?

a) 3.1% b) 4.2% c) 7.5% d) 9.3%

Page 19: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

19

202.5. Portfolio (P) is equally-weighted in two positions: a 50% position in StableCo (S) plus a 50% position in GrowthCo (G). Volatility of (S) is 9.0% and volatility of (G) is 19.0%. Correlation between (S) and (G) is 0.20. The beta of GrowthCo (G) with respect to the portfolio--denoted Beta (G, P)--is given by the covariance(G,P)/variance(P) where P = 0.5*G + 0.5*S. What is beta(G, P)?

a) 0.45 b) 0.88 c) 1.39 d) 1.55

202.6. Two extremely risky bonds have unconditional probabilities of default (Bernoulli PDs) of 10% and 20%. Their (linear) correlation is 0.35. What is the probability that both bonds default?

a) 2.0% b) 4.6% c) 6.2% d) 9.7%

Page 20: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

20

Answers: 202.1. C. 2.580% variance (X+Y) = variance(X) + variance (Y) + 2*covariance(X,Y), such that volatility (R1 + R2) = SQRT[variance(R1) + variance (R2) + 2*covariance(R1,R2)], such that where R = 1.6%, Two-day volatility (R1 + R2) = SQRT[1.6%^2 + 1.6%^2 + 2*1.6%*1.6%*0.30] = 2.580%

Please note, per the positive autocorrelation, this 2.58% is greater than the two-day volatility per the square root rule (SRR) which assumes independence:

if independent (i.e., correlation = 0), then 2-day volatility = 1.60% * SQRT(2) = 2.263% Answer (B) is incorrect because it implicitly assumes zero correlation between consecutive returns.

202.2. A. mean $276.00 and StdDev $46.65 The mean = 96%*100 + 92%*100 + 88%*100 = $276. Let B1, B2, and B3 represent the random Bernoulli variables. If independent, variance (100B1 + 100B2 + 100B3) = variance(100*B1) + variance (100*B2) + variance (*B3) = 100^2*variance(B1) + 100^2*variance(B2) + 100^2*variance(B3) = 100^2*[variance(B1) + variance(B2) + variance(B3)] = 10,000 * [4%*96% + 8%*92% + 12%*88%] = 2,176. StdDev = SQRT(2,176) = $46.65 202.3. B. 3.0 Key formulas are: variance (X - Y) = variance(X) + variance(Y) - 2*covariance(X,Y), and variance (aX) = a^2variance(X). In this case, var(2X - 4Y) = 4var(X) + 16var(Y) - 2*2*4*StdDev(X)*StdDev(Y)*correlation(X,Y). Given var(Y) = 49: var(2X - 4Y) = 4var(X) + 16*49 - 2*2*4*StdDev(X)*7*0.5, and: var(2X - 4Y) = 4var(X) + 784 - 56*StdDev(X), so that: 652 = 4var(X) + 784 - 56*StdDev(X), 0 = 4var(X) - 56StdDev(X) + 132; if we let w = StdDev(X), in order to factor we can express: 0 = 4w^2 - 56w + 132, which factors: 0 = (4w - 12)(w - 11), such that: Either 4w - 12 = 0, and w = 3; or w = 11. So this has two solutions but one is StdDev(X) = 3 such that variance(X) = 9.

Page 21: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

21

202.4. D. 9.3% If PD and LGD are independent, then EL = PD*LGD; In general, E[XY] = E[X]*E[Y] + covariance(X,Y), or E[PD * LGD] = E[PD] * E[LGD] + covariance(PD,LGD) = E[PD * LGD] = E[PD] * E[LGD] + StdDev(PD)*StdDev(LGD)*correlation(PD,LGD) = E[PD * LGD] = 7% * 60% + SQRT(7%*93%)*40%*0.5 = 9.303% 202.5. D. 1.55 variance (P) = 50%^2*9%^2 + 50%^2*19%^2+2*50%*50%*9%*19%*0.2 = 0.0127600 cov(G,P) = cov(G,0.5G+0.5S) = 0.5cov(G,G) + 0.5cov(G,S) = 0.5var(G) + 0.5cov(G,S). In this case, cov(G,P) = 0.5*19%^2 + 0.5*9%*19%*0.2 = 0.019760 beta(G,P) = cov(G,P)/variance(P) = 0.019760 / 0.0127600 = 1.5486 202.6. C. 6.2% Let respective default be represented by X and Y, such that we want: We want E[XY] = E[X]*E[Y] + covariance(X,Y) StdDev(X) = SQRT(10%*90%) = 0.30 and StdDev(Y) = SQRT(20%*80%) = 0.40, so that E[XY] = 10%*20% + 0.30*0.40*0.35 = 6.2%.

Please note: if the bonds were independent, then cov(X,Y) = 0, such that E[XY] = 2.0%

Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-202-variance-of-sum-of-random-variables.4967/

Page 22: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

22

P1.T2.203. Skew and kurtosis (Stock & Watson)

AIM: Define, calculate, and interpret the skewness, and kurtosis of a distribution

203.1. Let random variable W be distributed normally as N(0,10). What are, respectively, the following: i. The fourth moment of W, E[W^4]; and ii. The kurtosis of W?

a) 30.0 (4th moment) and zero (kurtosis) b) 100.0 and 3.0 c) 300.0 and zero d) 300.0 and 3.0

203.2. A random variable (X) has three possible outcomes: 90.0 with 40% probability; 100.0 with 50% probability; and 110.0 with 10% probability. What is the skewness of the variable's distribution?

a) -1.82 b) -0.95 c) 0.37 d) 0.74

203.3. An analyst gather information about the return distribution for two portfolios during the same period: Portfolio A shows skewness of 0.9 and kurtosis of 3.7; Portfolio B shows skewness of 1.3 and kurtosis of 2.1. The analyst asserts "Portfolio A is more peaked--that is, has a higher peak--than a normal distribution and Portfolio B has a long right tail."

a) The analyst is correct about both portfolios b) The analyst is correct about A but incorrect about B c) The analyst is correct about B but incorrect about A d) The analyst is wrong about both portfolios

Page 23: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

23

Answers: 203.1. D. 300.0 (fourth moment) and 3.0 (kurtosis) N(0, 10) signifies a normal distribution with mean = 0, variance = 10; and, by definition, skew = 0, and kurtosis = 3.0. Kurtosis = E[(W-E[W])^4]/sigma(W)^4 = E[(W-E[W])^4]/variance(W)^2. In this case, therefore, 3 = E[(W-0)^4]/10^2 = E[W^4]/100. The fourth moment E[W^4] = 3*100 = 300. By the way, the fourth moment about the mean, E[(W-E[W])^4], is therefore also equal to 300, since the mean is zero. In a normal distribution, "excess kurtosis" is equal to zero as excess kurtosis = kurtosis - 3.0. 203.2. C. 0.37 Skewness = E[(X - E[X])^3]/variance(X)^(3/2). Average (X) = 97.0; E[(X - E[X])^3] = 40%*(90-97)^3 + 50%*(100-97)^3 + 10%*(110-97)^3 = 40%*-343 + 50%*27 + 10%*2197 = -137.2 + 13.5 + 219.7 = 96.0; Variance (i.e., second moment about the mean) = 40%*7^2 + 50%*3^2 + 10%*13^2 = 41.0; Skewness = 96.0/41.0^1.5 = 0.3657 203.3. A. The analyst is correct about both portfolios Portfolio A has kurtosis greater than 3.0 and, most importantly, is heavy-tailed but also leptokurtic (higher peaked). Portfolio B has positive skew and therefore has a longer right tail. Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-203-skew-and-kurtosis-stock-watson.5223/

Page 24: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

24

P1.T2.204. Joint, marginal, and conditional probability functions

AIM: Describe Joint, marginal, and conditional probability functions

204.1. X and Y are discrete random variables with the following joint distribution; e.g., Pr (X = 4, Y = 30) = 0.07.

What is the conditional standard deviation of Y given X = 7; i.e., Standard Deviation(Y) | X = 7?

a) 10.3 b) 14.7 c) 21.2 d) 29.4

204.2. Sally's commute (C) is either long (L) or short (S). While commuting, it either rains (R = Y) or it does not (R = N). Today, the marginal (aka, unconditional) probability of no rain is 75%; P(R = N) = 75%. The joint probability of rain and a short commute is 10%; i.e., P(R = Y, C = S) = 10%. What is the probability of a short commute conditional on it being rainy, P (C = S | R = Y)?

a) 10% b) 25% c) 40% d) 68%

204.3. Economists predict the economy has a 40% of experiencing a recession in 2012; marginal P(R) = 40% and therefore the marginal probability of no recession P(R') = 60%. Let P(S) be the probability the S&P 500 index ends the year above 1400, such that P(S') is the probability the index does not end the year above 1400. If there is a recession, the probability of the index ending the year above 1400 is only 30%; P(S|R) = 30%. If there is not a recession, the probability of the index ending above 1400 is 50%; P(S|R') = 50%. Bayes' Theorem tells us that the conditional probability, P(R|S), is equal to the joint probability P(R,S) divided by the marginal probability, P(S). At the end of the year, the index does end above 1400, such that we observe (S) not (S'). What is the probability of a recession conditional on the index ending above 1400; i.e., P(R|S)?

a) 12.0% b) 28.6% c) 40.0% d) 42.0%

Page 25: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

25

Answers:

204.1. A. 10.3 E(Y|X=7) = 10*(0.05/0.32) + 20*(0.03/0.32) +30*(0.13/0.32) + 40*(0.11/0.32)= 29.375. E(Y^2|X) = 10^2*(0.05/0.32) + 20^2*(0.03/0.32) +30^2*(0.13/0.32) + 40^2*(0.11/0.32)= 968.75 Variance(Y|X=7) = 968.75 - 29.375^2 = 105.8594. StdDev(Y|X=7) = SQRT(105.8594) = 10.289. 204.2. C. 40% The conditional probability Pr(C = S | R = Y ) = Pr(C = S, R = Y ) / Pr (R = Y). The marginal probability of rain Pr (R = Y) = 1 - 75% = 25%; such that The conditional probability Pr(C = S | R = Y ) = 10% / 25% = 40%. 204.3. B. 28.6% According to Bayes, P(R|S) = P(R,S) / P(S). In this case, P(R,S) = P(R)*P(S|R) = 40%*30% = 12%. P(S) = P(R)*P(S|R) + P(R')*P(S|R') = 12% + 60%*50% = 12% + 30% = 42%. Such that, P(R|S) = 12%/42% = 28.6%; i.e., the ex post knowledge of (S) decreases the conditional probability of recession from its marginal probability of 40%. Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-204-joint-marginal-and-conditional-probability-functions-stock-watson.5236/

Page 26: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

26

P1.T2.205. Sampling distributions (Stock & Watson)

AIM: Describe the key properties of the normal, standard normal, multivariate normal, Chi-squared, Student t, and F distributions.

205.1. Two variables each have a normal distribution: X = N(20,4) and Y = N(40,9). The correlation between X and Y is 0.20. Let the (J) be a bivariate normal distribution such that J = 20*X + 28*Y. What is the Pr(1,355 < J < 1,753)?

a) 90.0% b) 93.0% c) 94.0% d) 96.0%

205.2. Each of the following is true about the student t distribution EXCEPT:

a) The student t distribution has skew equal to zero; variance equal to df/(df - 2) where (df) is degrees of freedom; and kurtosis greater than 3.0 (leptokurtosis with heavier tail and higher peak compared to the normal)

b) To test of significance of a single partial slope coefficient in a (sample) multiple regression with three independent variables (aka, regressors), we use a critical t with degrees of freedom (d.f) equal to the sample size minus four (n - 4)

c) The student's t distribution is the distribution of the ratio of a standard normal random variable divided by the square root of an independently distributed chi-squared random variable with (m) degrees of freedom divided by (m)

d) For asset returns involving large sample sizes (for example, n = 1,000), the student t should be used to simulate heavy tails as asset returns exhibit heavy tails

205.3. Each of the following is true about the chi-square and F distributions EXCEPT:

a) The chi-square distribution is used to test a hypothesis about a sample variance; i.e., given an observed sample variance, is the true population variance different than a specified value?

b) As degrees of freedom increase, the chi-square approaches a lognormal distribution and the F distribution approaches a gamma distribution

c) The F distribution is used to test the joint hypothesis that the partial slope coefficients in a multiple regression are significant; i.e., is the overall multiple regression significant?

d) Given a computed F ratio, where F ratio = (ESS/df)/(SSR/df), and sample size (n), we can compute the coefficient of determination (R^2) in a multiple regression with (k) independent variables (regressors)

Page 27: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

27

Answers: 205.1. C. 94.0% Please note N(20,4) signifies mean of 20 and variance of 4. In this case, Variance (J) = 20^2*4 + 28^2*9 + 2*20*28*SQRT(4)*SQRT(9)*0.2 = 10,000; such that: standard deviation (J) = 100 Expected value of J, E[J] = 20*20 + 28*40 = 1,520 As standardized variables are (1,355 - 1,520)/100 = -1.65 and (1,753 - 1,520)/100 = 2.33, we have: Pr (1,355 < J < 1,753) in standardized terms is Pr (-1.65 < N < 2.33) = 99.0% - 5.0% = 94.0%

Please note: for the exam, you cannot be expected to know the normal CDF, N(.), with two exceptions due to their common usage.

You simply must know:

N(-2.33) = 1% and N(-1.645) = 5%, and due to the symmetry of the normal of course,

N(1.645) = 95% and N(2.33) = 99%.

205.2. D. For large samples, the student t approximates the normal. For example, if n = 1,000, 99% one-tailed = 2.330083 compared to z = 2.32634. For large samples, the kurtosis is merely technical and the student t is NOT useful to approximate heavy tails!

In regard to (A), (B), and (C), each is true.

205.3. B. False, both approach a normal distribution; all of the so-called sampling distributions (student t, chi-square, F) approach normal as d.f. --> inf.

In regard to (A), (C) and (D), each is true.

Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-205-sampling-distributions-stock-watson.5247/

Page 28: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

28

P1.T2.206. Variance of sample average

AIMs: Define, calculate, and interpret the mean and variance of the sample average (Stock & Watson)

206.1. A portfolio is equally weighted among nine pairwise independent assets, each with identical volatility of 14.0%; i.e., n = 9, sigma (i) = 14%, weight of asset (i) = 1/9, and all correlations (i,j) = 0. If we add another independent asset with same volatility of 14.0%, such that that (n) increases to from 9 to 10, and each asset weight dilutes to 1/10, what is absolute change to portfolio volatility?

a) Zero b) Reduced by 0.24% (absolute) c) Reduced by 1.18% (absolute) d) Reduced by 2.48% (absolute)

206.2. A stock has an expected (i.i.d.) return of 9.0% per annum and volatility of 10% per annum. The distribution of the average (continuously compounded) rate of return has a mean of 8.5% per annual as 9.0% - 10.0%^2/2 = 8.5%; i.e., over several years, the average realized return is expected to be 8.5% per year. Over a five-year horizon, we can be 95% confident that the realized average (i.e. per year) per annum return will exceed what level?

a) -7.95% b) 0.85% c) 1.14% d) 4.47%

206.3. A basket credit default swap (basket CDS) references one hundred (100) very risky credits. Each credit is characterized by a random Bernoulli variable and either defaults with probability of 9.0% or does not default. Further, the credits are uncorrelated; note these two assumptions satisfy i.i.d. as the distributions are identical and independent. In this way, the basket's expected average default rate is 9.0%. What is the 95% confidence interval, around this expected mean, for basket's average default rate?

a) 3.4% < E[average default rate] < 14.6% b) 4.3% < E[average default rate] < 13.7% c) 5.2% < E[average default rate] < 12.6% d) 6.1% < E[average default rate] < 11.9%

Page 29: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

29

Answers: 206.1. B. Reduced by 0.24% Variance (n = 9) = 14%^2/9, such that portfolio volatility = SQRT(14%^2/9) = 4.6667% portfolio volatility (n = 10) = SQRT(14%^2/10) = 4.4272% Improvement = 4.4272% - 4.6667% = -0.2395%. 206.2. C. 1.14% The AVERAGE return has a variance of 10%^2/5 = 0.0020; such that the AVERAGE return has volatility of SQRT(10%^2/5) = 4.47% The mean is 8.5% per year, such that 95% confidence bound is given by: 8.5% - 1.645*4.47% = 1.143% 206.3. A. 3.4% < E[average default rate] < 14.6% The variance of a single credit = p*(1-p) = 9%*(1-9%) such that the variance of the AVERAGE default rate of 100 i.i.d. credits is (9%*91%)/100 = 0.000819; The standard deviation of the average default rate = SQRT[(9%*91%)/100] = 2.86%. The 95% confidence interval is: 9.0% +/- 1.96*2.86%: 3.39% < average default rate < 14.61%

Please note: here the individual variable is non normal (Bernoulli). But that doesn't matter, CLT only requires i.i.d. and tells us that the average will tend toward (converge) a normal distribution as the sample increases.

Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-206-variance-of-sample-average.5271/

Page 30: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

30

P1.T2.207. Law of large numbers and Central Limit Theorem (CLT)

AIMs: Define and describe random sampling and what is meant by i.i.d. Describe, interpret, and apply the Law of Large Numbers and the Central Limit Theorem (Stock & Watson)

207.1. Analyst Joe wants to apply the square root rule (SRR) to scale daily asset volatility into monthly asset volatility. For example, if the daily volatility is (D), then the scaled monthly volatility will be given by M = D*SQRT(20). Consider the following possible assumptions:

I. Each daily return has a normal distribution, although the mean and variance varies II. Knowledge of today's return gives no information about tomorrows return

III. Daily returns are autocorrelated (positive serial correlation) IV. Each daily return is non-normal, with heavy tail, although distributional moments are

constant

Joe is informed that application of the square root rule (SRR) requires that returns are i.i.d. Therefore, which of the above assumptions is (are) necessary to legitimately scale the volatility?

a) I. only b) I. and II. c) II. and III. d) II. and IV.

207.2. Unrealistically but somehow, Analyst Sally has determined that the true population default rate of a single BB-rated bond is 1.0%; i.e., default is a Bernoulli with p(default) = 1.0%. She is analyzing a collateralized debt obligation (CDO) which references a sample size of (N) of these BB-rated bonds. She is interested in the default rate of the entire sample referenced by the CDO. Which of the following most nearly summarizes the law of large numbers?

a) As N increases, the number of defaults (D) must also increases b) As N increases, the CDO's default rate of D/N will exceed 1.0% with a probability that

tends toward 1.0. c) As N increases, the CDO's default rate of D/N will converge to 1.0%, but only if bond

defaults are i.i.d. d) As N increases, the CDO's default rate of D/N will converge to 1.0% and no assumptions

are required

Page 31: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

31

207.3 Portfolio Manager Roger is analyzing a universe of potential stocks; unrealistically, it happens to be that each stock offers identical expected returns (mu) of 6.0% and identical standard deviations (sigma) of 18.0%. Roger wants to invest in a sample size of (N) of the stocks. Let (R) equal the average return of the invested sample. Roger argues that, according to the Central Limit Theorem, as (N) increases, the distribution of [(R - 6.0%)/SQRT(18%^2/N)] becomes increasingly well approximated by the standard normal distribution. Which of the following best characterizes Roger's argument about the CLT?

a) Roger is correct and no further conditions are required b) Roger is correct, but only if he can assume that the stock returns are characterized by a

normal distribution c) Roger is correct, if he can also assume independence among returns, but the stock

returns do NOT need to be normally distributed d) Roger is plainly incorrect, his argument is unrelated to the CLT

Page 32: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

32

Answers: 207.1. D. II. and IV. Normality is not required. The square root rule (SRR) assumes i.i.d. returns. "Knowledge of today's return gives no information about tomorrows return" signifies INDEPENDENCE. "Distributional moments are constant" signifies IDENTICAL DISTRIBUTIONS. 207.2. C. As N increases, the CDO's default rate of D/N will converge to 1.0%, if bond defaults are i.i.d. 207.3 C. Roger is correct, if he can also assume independence among returns, but the stock returns do NOT need to be normally distributed The CLT does not require the population distribution to be normal. Rather, it requires i.i.d. and finite variance. Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-207-law-of-large-numbers-and-central-limit-theorem-clt.5283/

Page 33: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

33

Statistics - Key Ideas

We mostly do not observe population parameters but instead infer them from sample estimates which are values given by estimators such as sample mean and sample variance. An estimator is a “recipe” for obtaining an estimate of a population parameter.

The sample mean is BLUE: Best Linear Unbiased Estimator

The t-statistic tests the null hypothesis that the population mean equals a certain value.

If the sample (n) is large (e.g., greater than 30), the t-statistic has a standard normal sampling distribution when the null hypothesis is true.

A common test is to test the significance of a regression coefficient. While the specifics vary, in many cases here the null is “the slope coefficient is zero.”

The p-value is an “exact (aka, marginal) significance level:” it is the probability of drawing a statistic at least as adverse to the null hypothesis as the one actually computed (observed), assuming the null hypothesis is correct.

p-value is the smallest significance level at which the null can be rejected

If the p-value is very small (e.g., 0.00x), reject the null. If the p-value is large (e.g., 0.19 or 19%), accept (fail to reject) the null.

You will NOT be asked, on the FRM, to calculate a p-value (e.g., you cannot derive it on the TI BA II+ or HP 12c). You may be asked to interpret a given p-value.

A 95% confidence interval for is an interval constructed so that it contains the true value of in 95% of all possible samples:

90% CI for 1.64

95% CI for 1.96

99% CI for 2.58

Y

Y

Y

Y SE Y

Y SE Y

Y SE Y

Where SE is the standard error = sample standard deviation / SQRT (n) = SQRT (sample variance / n)

Sample covariance: 1

sample ( )( )1

XY i iX X Y Yn

Sample correlation sample XYXY

X Y

sr

s s

Correlation (X,Y) = covariance (X,Y) / [Std Deviation(X)] * [Std Deviation(Y)]

Page 34: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

34

Statistics (Miller, Chapter 3) P1.T2.303 Mean and variance of continuous probability density functions (pdf) P1.T2.304. Covariance (Miller) P1.T2.305. Minimum variance hedge (Miller) P1.T2.306. Calculate the mean and variance of sums of variables. P1.T2.307. Skew and Kurtosis (Miller) P1.T2.308. Coskewness and cokurtosis

P1.T2.303 Mean and variance of continuous probability density functions (pdf)

AIMs: Define, calculate, and interpret the mean, standard deviation, and variance of a random variable.

303.1. Assume a continuous probability density function (pdf) is given by f(x) =a*x such that 0 ≤ x ≤ 12, where (a) is a constant (we can retrieve this constant, knowing this is a probability density function):

What is the mean of (x)?

a) 5.5 b) 6.0 c) 8.0 d) 9.3

303.2. Assume a continuous probability density function (pdf) be given by f(x) = a*x^2 such that 0 ≤ x ≤ 3, where (a) is a constant (that we can find).

Let us arbitrarily define the unexpected loss (UL) as the difference between this distribution's mean and its 5.0% quantile function; i.e., UL(X) = mean (X) - inverse CDF (5%)(X). We could call this a 95% relative VaR since it is relative to the mean. What is this UL?

a) 0.62 b) 1.14 c) 2.05 d) 3.37

Page 35: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

35

303.3. Assume the following probability density function (pdf) for a random variable X:

What is the variance of X?

a) a. 2.0 b) b. 3.3 c) c. 4.1 d) d. 5.7

Page 36: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

36

Answers: 303.1. C. 8.0 If this is a valid probability (pdf) then a*(1/2)*x^2 evaluated over [0,12] must equal one: a*(1/2)*12^2 = 1.0, and a = 1/72. Therefore the pdf function is given by f(x) = x/72 over the domain of [0,12]. The mean = Integral of x*f(x) = x*(1/72)*x = Integral of x^2/72 over [0,12] = x^3/216 over [0,12] = 12^3/216 = 8.0 303.2. B. 1.14 If this is a valid pdf then a^2*(1/3)*x^3 evaluated over [0,3] must equal one: a*(1/3)*x^3 = 1.0, and a = 3/27 = 1/9. Therefore the pdf function is given by f(x) = x^2/9 over the domain of [0,3] The mean (mu) = Integral of x*f(x) = x*x^2/9 = Integral of x^3/9 over [0,3] = (1/9)*(1/4)*x^4 over [0,3] = 3^4/36 = 9/4. For 5% quantile, we need value of (m) such that the integral of f(x)*dx over [0,m] = 0.05. So, we need (m) so that x^2/9*dx over [0,m] = 0.05, and x^3/27 over [0,m] = 0.05, and m^3/27 = 0.05, therefore m = (27*0.05)^(1/3) = 1.11 UL(5%) = mean - 1.11 = 9/4 - 1.11 = 1.14 303.3. A. 2.0 Mean (mu) = Integral of x*f(x) = x*x/18 =x^2/18 evaluated from [0,6] = (1/54)*x^3 from [0,6] = 6^3/54 = 4.0 Variance = Integral of (x - mu)^2*f(x)*dx evaluated from [0,6] = (1/18)*[x^4/4 - 8*x^3/3 + 8*x^2] from [0,6] = (1/18)*[6^4/4 - 8*6^3/3 + 8*6^2] = 2.0 Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-303-mean-and-variance-of-continuous-probability-density-functions-pdf.6783/

Page 37: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

37

P1.T2.304. Covariance (Miller)

AIM: Define, calculate, and interpret the covariance and correlation between two random variables.

304.1. Two assets, X and Y, produce only three joint outcomes: Prob[X = -3.0%, Y = -2.0%] = 30%, Prob[X = +1.0%, Y = +2.0%] = 50%, and Prob[X = +5.0%, Y = +3.0%] = 20%:

What is the correlation between X & Y? (Bonus question: if we removed the probabilities and instead simply treated the three sets of returns as a small, [tiny actually!] historical sample, would the sample correlation be different?)

a) 0.6330 b) 0.7044 c) 0.8175 d) 0.9286

304.2. Each of random variable X and Y can have two outcomes. The following probability matrix gives their joint probabilities:

For example, the joint Prob[X = 4.0, Y = 3.0] = 30%. What is the covariance between X and Y?

a) -0.9727 b) 0.3150 c) 1.4842 d) 4.9224

Page 38: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

38

304.3. Let X be a discrete uniform random integer in the set {1, 2, 3, 4, 5} with equal probability of each outcome and let Y = (X+1)^2:

2(X 1) . . is uniform random {1,2,3,4,5}

Y s tX

What is the covariance between X & Y?

a) 5.5 b) 9.0 c) 16.0 d) 25.0

Page 39: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

39

Answers: 304.1. D. 0.9286 As Covariance(X,Y= 0.0520%, StdDev(X) = 2.8%, StdDev(Y) = 2.0%, correlation = 0.0520%/(2.8%*2.0%) = 0.9286. See snapshot below, some key points:

Variances: first row of Variance(X) = (-3.0% - 0.60%)^2*30%, and variance(X) is sum of the three probability-weighted squared deviations: 0.0784% = 0.0389%+0.0008%+0.0387%

Covariance Method 1: first row of Covariance (Method 1) = (-3.0%-0.60%)*(-2.0%-1.0%)*30%; then Covariance (M 1) is sum of three rows.

Covariance Method 2: first row of Covariance (Method 2) = -3.0%*-2.0%*30% = 0.0180%, second row = 1.0%*2.0%*50%=0.010%. Covariance (M2) = 0.0580% - 0.60%*1.0% = 0.0580%. This employs the highly useful Cov(x,y) = E[xy] - E[x]E[y], which includes the special case Cov(x,x) = Variance(x) = E[x^2] - (E[x])^2

Spreadsheet at https://www.dropbox.com/s/x243p9cix2efx5v/T2.304.1_covariance.xlsx

Page 40: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

40

If we removed probabilities and treated as a (very) small historical sample, the sample is different at 0.945. There are two reasons:

1. The historical sample (by default) treats the observations as equally-weighted; and,

2. A sample correlation divides the sample covariance by sample standard deviations,

where (n-1) is used in the denominator instead of (n);

In this way the sample covariance is larger, ceteris paribus, than a population-type variance, and so are sample standard deviations. 304.2 B. 0.3150 The E[X*Y] = 4*3*30% + 7*3*15% + 4*5*25% + 7*5*30% = 22.250; The E[X] = 4*(30%+25%) + 7*(15%*30%) = 5.350, The E[Y] = 3*(30%+15%) + 5*(25%+30%) = 4.10, such that Cov[X,Y] = E{X*Y] - E[X]*E[Y] = 22.250 - 5.350*4.10 = 0.3150. 304.3. C. 16.0 The E[X*Y] = [1*(1+1)^2 + 2*(2+1)^2 + 3*(3+1)^2 + 4*(4+1)^2 + 5*(5+1)^2]/5 = 350/5 = 70 The E[X] = 3, The E[Y] = 18, so that: Cov[X,Y] = 70 - (3*18) = 16.0

Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-304-covariance-miller.6791/

Page 41: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

41

P1.T2.305. Minimum variance hedge (Miller)

AIMs: Interpret and calculate the variance for a portfolio and understand the derivation of the minimum variance hedge ratio.

305.1. A two-asset portfolio contains a long position in commodity (T) with volatility of 10.0% and a long position in stock (S) with volatility of 30.0%. The assets are uncorrelated: rho(T,S) = zero (0). What weight (0 to 100%) of the portfolio should be allocated to the commodity if the goal is a minimum variance portfolio (in percentage terms, as no dollars are introduced)?

a) 62.5% b) 75.0% c) 83.3% d) 90.0%

305.2. A portfolio manager owns (has a long position in) $100 of Security A which has a volatility of 9.0%. She wants to hedge with Security B which has a volatility of 15.0%. The correlation between the securities is 0.40; rho(A,B) = +0.40. What position in Security B utilizes the minimum variance hedge ratio to create a portfolio with the minimum dollar ($) standard deviation?

a) Long $60.0 b) Short $24.0 c) Short $60.0 d) Short $76.0

305.3. A portfolio of $3.00 is equally invested in three securities (A, B and C), such that $1.00 is conveniently invested in each security, where the relationship between the securities is characterized by the following variance-covariance matrix:

For example, the volatility of security A's returns is SQRT(0.040) = 20% and the covariance (B,C) = 0.120. What is the portfolio's dollar standard deviation?

a) $0.41 b) $0.55 c) $0.82 d) $1.34

Page 42: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

42

Answers: 305.1. D. 90.0% If w = weight in the commodity, the two-asset portfolio variance, VARP = w^2*10%^2 + (1-w)^2*30%^2 + 2*0*w*(1-w)*10%*30% = w^2*0.01 + (1-w)^2*0.09. We want the value of (w) that minimizes the portfolio variance, so we take the first derivative with respect to w: dPVARP/dw = d[w^2*0.01 + 0.09*(1 - 2*w + w^2)]/dw = d[w^2*0.01 + 0.09 - 0.18*w + 0.09*w^2)]/dw = 0.02*w - 0.18 + 0.18*w = 0.20*w - 0.18. To find the local minimum, we set the first derivative equal to zero, and solve for w: let 0 = 0.20*w - 0.18, such that = 0.18/0.20 = 90.0%. A portfolio with 90% weight in the commodity and 10% in the stock will have the lowest variance at 0.0090, which is equal to standard deviation of SQRT(0.0090) = 9.486%; i.e., lower than either of the asset volatilities. Of course, this optimal mix is variant to changes in the correlation. The first derivative can be taken of the generic two-asset portfolio variance such that its minimum variance is given by: Variance(minimum) = (sigma2^2 - rho*sigma1*sigma2) / (sigma1^2 + sigma2^2 - 2*rho*sigma1*sigma2). 305.2. B. Short $24.0 The minimum hedge ratio, h* = -rho*volatility(A)/volatility(B). In this case, h* = -0.40*9.0%/15% = -0.24, such that the trade is short 0.24 for each $1.0 in Security A. Therefore, -0.24 * $100 in Security A = short $24.00. And, indeed, this portfolio's standard deviation = SQRT[$100^2*9%^2 + (-$24)^2*15%^2 + 2*(0.40)*$100*(-$24)*9%*15%] = $8.2486, which is the minimum dollar standard deviation.

Page 43: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

43

305.3. C. $0.82 The short method is to realize that the vector of positions is [$1 $1 $1] such that the dollar portfolio variance is conveniently the sum of the cells: 0.6800, and the dollar standard deviation is SQRT(0.680) = $0.825. This is due to the first portfolio variance formula below, where (x) denotes a position. Alternatively, we can employ the matrix notation where the weights are 1/3 = 33.3% and portfolio variance = w^T*D*w. Since 7.56% is the variance, the standard deviation (%) = SQRT(7.56%) = 27.49% and the dollar standard deviation of the portfolio = $3.00 * 27.49% = $0.8246.

Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-305-minimum-variance-hedge-miller.6800/

Page 44: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

44

P1.T2.306. Calculate the mean and variance of sums of variables. 306.1. In credit risk (Part 2) of the FRM, a single-factor credit risk model is introduced. This model gives a firm's asset return, r(i), by the following sum of two components:

In this model, a(i) is a constant, while (F) and epsilon (e) are random variables. Specifically, (F) and (e) are standard normal deviates with, by definition, mean of zero and variance of one ("unit variance"). If the value of a(i) is 0.750 and the covariance[F,e(i)] is 0.30, which is nearest to variance of the asset return, variance[r(i)]?

a) 0.15 b) 1.30 c) 1.47 d) 1.85

306.2. A two-asset portfolio, (P), has a 60% long position in a Safe Stock, (S), which has a volatility of 20.0% and a 40% long position in Risky Stock, (R), which has a volatility of 35.0%. Their return correlation is 0.40. Marginal value at risk (marginal VaR; a Part 2 FRM concept), employs the beta of a position with respect to the portfolio that contains it; i.e., beta[Position, its Portfolio). In this example, we can refer to the beta [S, P], which is like any beta given by covariance[S, P]/variance[P] if we are careful to note that the P =(60%*S) + (40%*R). Which is nearest to the beta of Safe Stock with respect to its portfolio, beta [S, P]?

a) -0.25 b) +0.49 c) +0.74 d) +0.93

Page 45: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

45

306.3. The following single-period, single-factor index model characterizes an asset return, R(i), as a function of a market index variable, X(i):

In this model, alpha and beta are constants. However, like X(i), epsilon, e(i) is an important random variable: it captures the asset's specific risk. Assume the following:

Constant (a) = 2.00%

constant (beta) = 0.60

E[X] = 8.00% and variance(X) = 10%^2 = 0.010

Expected value[e] = 0 and variance(e) = 20%^2 = 0.040

Covariance(X,e) = 0.0040

If the 95% absolute value at risk (VaR) is defined as VaR = -E[R] + 1.65*Standard Deviation[R], what is the 95% absolute VaR?

a) 8.39% b) 17.88% c) 29.50% d) 38.47%

Page 46: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

46

Answers: 306.1. B. 1.30 var(x+y) = var(x) + var(y) + 2*cov(x,y). In this case, x = a*F and y = sqrt(1-a^2)*e, such that: var[a*F + sqrt(1-a^2)*e] = var(a*F) + var[sqrt(1-a^2)*e] + 2*cov[a*F, sqrt(1-a^2)*e] = a^2*var(F) + (1-a^2)*var(e) + 2*cov[a*F, sqrt(1-a^2)*e], and since var(F) and var(e) = 1.0, this is equal to: = a^2*1.0 + (1-a^2)*1.0 + 2*cov[a*F, sqrt(1-a^2)*e] = a^2 + 1-a^2 + 2*cov[a*F, sqrt(1-a^2)*e] = 1.0 + 2*cov[a*F, sqrt(1-a^2)*e] = 1.0 + 2*a*sqrt(1-a^2)*cov[F,e(i)]; and per cov(a*x,b*y) = a*b*cov(x,y): = 1.0 + 2*0.75*SQRT(1-0.75^2)*0.30 = 1.2976

Please note: in the actual single-factor model, the assumption is that covariance[F,e(i)] = 0, and by design, unlike this question, the variance of the asset return in the actual model is 1.0 because the term cov[a*F, sqrt(1-a^2)*e] is equal to zero, REGARDLESS of the value of (a) since it cancels otherwise!

306.2. C. +0.74 Recall the two-asset Portfolio (P) =(60%*S) + (40%*R), such that: Cov(S,P) = Cov(S,0.6*S + 0.4*R) = 0.60*Cov(S,S) + 0.4*Cov(S,R) = 0.60*variance(S) + 0.4*Cov(S,R). In this case then Cov(S,P) = 0.60*20%^2 + 0.4*20%*35%*0.4 = 0.03520; The porfolio variance = 60%^2*20%^2 + 40%^2*35%^2 + 2*60%*40%*20%*35%*0.40 = 0.047440, therefore: Beta(S,P) = Cov(S,P)/variance(P) = 0.03520/0.04744 = 0.7420; the greater its weight in the portfolio, the more that beta[S,P] tends to 1.0. 306.3. C. 29.50% E[R] = alpha+beta*E[X] + E[e] = 2% + 0.60*8% = +6.80%; a.k.a., drift. Variance[R] = variance(beta*X) + variance(e) + 2*covariance(beta*X, e) = beta^2*variance(X) + variance(e) + 2*beta*covariance(X,e). In this case, = 0.60^2*0.010 + 0.040 + 2*0.60*0.0040 = 0.04840 Standard Devation[R] = SQRT(0.04840) = 22.00%, and VaR = -E[R] + 1.65*Standard Deviation[R] = -6.80% + 1.65*22.00% = 29.50% Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-306-calculate-the-mean-and-variance-of-sums-of-variables.6810/

Page 47: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

47

P1.T2.307. Skew and Kurtosis (Miller)

AIMs: Describe the four central moments of a statistical variable or distribution: mean, variance, skewness and kurtosis.

307.1. A bond has a default probability of 5.0%. Which is nearest, respectively, to the skew (S) and kurtosis (K) of the distribution?

a) S = 0.0, K = 2.8 b) S = 0.8, K = -7.5 c) S = 4.1, K = 18.1 d) S = 18.9, K = 4.2

307.2. Assume a discrete uniform random variable (X) can assume one of three outcomes {1, 2, or 6} with equal probability of 1/3rd each; which is to say, this is not a sample but a (population's) probability distribution. Which is nearest to this distribution's skew?

a) -0.37 b) +0.60 c) +1.44 d) +2.79

307.3. Let (X) be a random variable with three outcomes: Prob[X=1] = 25%, Prob[X=2] = 50%, and Prob[X=3] = 25%. Which is nearest to the kurtosis of this distribution?

a) 0.33 b) 2.00 c) 3.50 d) 4.75

Page 48: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

48

Answers: 307.1. C. S = 4.1, K = 18.1 Let X = 0 with prob 95.0% and X = 1 with prob 5.0%, such that mean (X) = 0.050: 3rd central moment = (1-0.05)^3*5% + (0-0.05)^3*95% = 0.04275, such that skew = 0.04275/(5%*95%)^(3/2) = 4.1295 (or -4.1295) 4th central moment = (1-0.05)^4*5% + (0-0.05)^4*95% = 0.04073, such that kurtosis = 0.04073/(5%*95%)^2= 18.053 i.e., excess kurtosis = 18.053 - 3.0 = 15.053 307.2. B. +0.60 The mean is (1+2+6)/3 = 3.0. The 3rd central moment = [(1-3)^3 + (2-3)^3 + (6-3)^3]/3 = 6.0. As the variance is 4.67, The skewness = 3rd moment/variance^(3/2) = 6.0/4.67^(3/2) = 0.5952.

Note we can also apply Miller's Equation 3.45 for the 3rd central moment:

= E[X^3] - 3*mu*variance - mu^3 = 75 - 3*3*4.667 - 3^3 = 6.00 307.3. B. 2.00 The mean of X = 1*25% + 2*50% + 3*25% = 2.0. The variance of X = (1-2)^2*25% + (2-2)^2*50% + (3-2)^2*25% = 0.50 The 4th central moment = (1-2)^4*25% + (2-2)^4*50% + (3-2)^4*25% = 0.50 The kurtosis of X = 4th central moment/variance^2 = 0.50/0.50^2 = 2.00

Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-307-skew-and-

kurtosis-miller.6825/

Page 49: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

49

P1.T2.308. Coskewness and cokurtosis

AIMs: Interpret skewness and kurtosis of a statistical distribution, and interpret the concepts of coskewness and cokurtosis. Define and interpret best linear unbiased estimator (BLUE).

308.1. The thee joint outcomes of two variables (X,Y) are characterized by the following probability distribution:

Which is nearest to the co-skew (XXY) and co-kurtosis (XYYY) of this bivariate distribution (please note this question is clearly more difficult than an exam-level question; it is meant to give concrete practice to the concept)?

a) Co-skew(XXY) = -0.245, Co-kurtosis(XYYY) = -7.931 b) Co-skew(XXY) = -0.245, Co-kurtosis(XYYY) = 1.6250 c) Co-skew(XXY) = +0.411, Co-kurtosis(XYYY) = +3.027 d) Co-skew(XXY) = +1.588, Co-kurtosis(XYYY) = +5.682

308.2. In regard to skew, kurtosis, co-skew and co-kurtosis, each of the following is true EXCEPT which is technically false (this is a difficult question meant for training purposes)?

a) The coskew between A and B, S(AAB) = E[(A-mu[A])^2*(B-mu)], where mu[A] is the mean of (A)

b) In the case of a bivariate distribution between two (2) random variables, we can compute two (2) nontrivial co-skew and three (3) nontrivial cokurtosis statistics

c) If a univariate population skew is adjusted to its sample-based equivalent, for a given (n), the sample skew might be greater or less than the population skew

d) For ten (10) random variables, there are 45 non-trivial second (2nd) cross central moment, where non-trivial refers to covariance(X,X)

Page 50: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

50

308.3. Analyst Rob has identified an estimator, denoted T(.), which qualifies as the best linear unbiased estimator (BLUE). If T(.) is BLUE, which of the following must also necessarily be TRUE?

a) T(.) must have the minimum variance among all possible estimators b) T(.) must be the most efficient (the "best") among all possible estimators c) It is possible that T(.) is the maximum likelihood (MLE) estimator of variance; i.e.,

SUM([X - average (X)]^2)/(n-1) d) Among the class of unbiased estimators that are linear, T(.) has the smallest variance

Page 51: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

51

Answers: 308.1. B. Co-Skew(XXY) = -0.245, Co-Kurtosis(XYYY) = 1.6250 See spreadsheet here at https://www.dropbox.com/s/qj554pf4dd5teb3/t2.308.1%20coskew_cokurt.xlsx?dl=0

308.2. A. E[(A-mu[A])^2*(B-mu)] is the third central cross-moment, it needs to divide by variance(A)*StandardDeviation(B) in order to be standardized into the skew

In regard to (B), this is true: given X and Y, skew can be either S(XXY) or S(XYY) and kurtosis can be either K(X^2*Y^2), K(X^3,Y), or K(X,Y^3).

In regard to (C), this is true but difficult: sample skew will adjust both the numerator and denominator such that positive sample skew will tend to be lower but negative sample skew will tend to be higher; i.e., sample skew will tend to adjust toward zero.

In regard to (D), did you realize this was only asking about a covariance matrix with the diagonal excluded (the diagonal is the "nontrivial" variance)? If n = 10, the number of entries in the triangle = n*(n+1)/2 = 10*11/2 = 55 - 10 in the diagonal = 45 unique covariance entries excluding the variances in the diagonal.

Page 52: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

52

308.3. D. TRUE: Among the class of unbiased estimators that are linear, T(.) has the smallest variance. To be "best" is to be efficient, and to be efficient is to be the estimator with the lowest variance among unbiased estimators. BLUE adds the linearity requirement, such that BLUE is the minimum variance among the linear estimators that are unbiased.

In regard to (A), this is false (not necessarily true) because an biased estimator can have a smaller variance.

In regard to (B), this is false (not necessarily true) because another unbiased estimator can be more efficient than T(.) due to a smaller variance, yet not be non-linear; i.e., within the class of unbiased estimators, there can be a smallest variance among both linear and non-linear estimators.

In regard to (C), this is false because the MLE estimator of variance is biased; dividing by (n) instead of (n-1) gives us a sample variance estimator that is unbiased (although its square root is strangely not unbiased). With respect to sample variance, our choice is to divide by (n) for the biased MLE or divide by (n-1) for the unbiased estimator. Also, please note this is not BLUE due to non-linearity.

Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-308-coskewness-and-

cokurtosis.6836/

Page 53: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

53

Statistics (Stock & Watson Chapter 3) P1.T2.208. Sample mean estimators (Stock & Watson) P1.T2.209. T-statistic and confidence interval P1.T2.210. Hypothesis testing (Stock & Watson) P1.T2.211. Type I and II errors and p-value (Stock & Watson) P1.T2.212. Difference between two means P1.T2.213. Sample variance, covariance and correlation (Stock & Watson)

P1.T2.208. Sample mean estimators (Stock & Watson)

AIMS: Describe and interpret estimators of the sample mean and their properties. Describe and interpret the least squares estimator. Describe the properties of point estimators: Distinguish between unbiased and biased estimators. Define an efficient estimator and consistent estimator.

208.1. A random sample, drawn from a population with unknown mean and variance, includes the following six outcomes: 3, 6, 6, 8, 9, 10. Please note: "random sample" implies independent and identically distributed (i.i.d.). Each of the following is TRUE EXCEPT:

a) The sample variance is 6.40 b) The standard error of the sample mean is 2.61 c) The standard error of the sample mean is an estimator of the standard deviation of the

sample mean d) The sample variance employs a degrees of freedom correction (n-1). However even for

this small sample, the standard error of the sample mean uses (n) or SQRT(n) in the denominator and therefore does not itself employ a degrees of freedom correction.

208.2. We assume there is a population mean for the monthly return of hedge funds that employ a certain strategy (e.g., market neutral funds in 2011). A sample of hedge fund returns is collected and the sample mean return is +1.0% per month. In regard to this sample mean as an estimator, each of the following is true EXCEPT which is false?

a) If the returns are not a random sample (i.e., are not i.i.d.), the sample mean may be biased estimator of the population mean

b) If the returns are a random sample (i.i.d.), the sample mean is the Best Linear Unbiased Estimator (BLUE)

c) If the returns are a random sample (i.i.d.), the sample mean is the least squares estimator

d) If the returns are a random sample (i.i.d.), the property of consistency implies that the variance of the sample mean is smaller than the variance of alternative estimators of the population mean

Page 54: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

54

208.3. A backtest of a 99.0% value at risk (VaR) model over two years observes 8 exceptions in 500 trading days; i.e., the VaR loss threshold was exceeded on 1.6% of the days but the model was calibrated to expect losses in excess of the VaR for only 5 days (1.0%). Please note that we assume exceptions (exceedances) are i.i.d. with a Bernoulli distribution. What is, respectively, the standard error of the sample mean and the t-statistic? (Bonus for finding the p-value, which cannot be done with most calculators)

a) 0.39% (s.e.) and 0.88 t-statistic b) 0.47% (s.e.) and 1.03 t-statistic c) 0.56% (s.e.) and 1.07 t-statistic d) 0.62% (s.e.) and 1.65 t-statistic

Page 55: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

55

Answers: 208.1. B. Standard error of the sample mean is 1.038 Sample mean = 7. Sample variance = Sum of: [(3-7)^2 + (6-7)^2 + (6-7)^2 + (8-7)^2 + (9-7)^2 + (10-7)^2] / (6-1) = 32/5 = 6.40 Sample standard deviation = SQRT[sample variance] = 2.5298 Standard error = SQRT[sample variance/n] = SQRT(6.4/6) = 1.038 = sample standard/SQRT(n) = 1.038

In regard to (A), (C), and (D), each is TRUE.

208.2. D. Consistency says that, as the sample size increases, the estimator converges toward the population mean; an estimator with a smaller variance is more EFFICIENT. Unbiased says the E[estimator] = population parameter.

In regard to (A), (B) and (C), each is TRUE.

208.3. C. 0.56% (s.e.) and 1.07 t-statistic The standard error of the sample mean = SQRT[1.6%*(1-1.6%)/500] = 0.56114% The t-statistic or t-ratio = (1.6% - 1.0%)/0.56114% = 1.0692; i.e., the null hypothesis is that the VaR model is accurate such that we expect p = 1.0%. The p-value = 2*[1 - normal CDF (z = 1.0692)] = 2*[1-85.75%] = 28.5%; i.e., we would fail to reject the null hypothesis that the VaR model is accurate. Put another way, 1.6% could exceed an accurate VaR (i.e., 1%) due merely to random sampling variation, as we could expect this outcome fully 28.5% of the time. Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-208-sample-mean-estimators-stock-watson.5302/

Page 56: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

56

P1.T2.209. T-statistic and confidence interval

AIMs: Define, interpret, and calculate the t-statistic. Define, calculate, and interpret a confidence interval.

209.1. Nine (9) companies among a random sample of 60 companies defaulted. The companies were each in the same highly speculative credit rating category: statistically, they represent a random sample from the population of CCC-rated companies. The rating agency contends that the historical (population) default rate for this category is 10.0%, in contrast to the 15.0% default rate observed in the sample. Is there statistical evidence, with any high confidence, that the true default rate is different than 10.0%; i.e., if the null hypothesis is that the true default rate is 10.0%, can we reject the null?

a) No, the t-statistic is 0.39 b) No, the t-statistic is 1.08 c) Yes, the t-statistic is 1.74 d) Yes, the t-statistic is 23.53

209.2. Over the last two years, a fund produced an average monthly return of +3.0% but with monthly volatility of 10.0%. That is, assume the random sample size (n) is 24, with mean of 3.0% and sigma of 10.0%. Further, the population’s returns are normal. Are the returns statistically significant; in other words, can we decide the true mean return is great than zero with 95% confidence?

a) No, the t-statistic is 0.85 b) No, the t-statistic is 1.47 c) Yes, the t-statistic is 2.55 d) Yes, the t-statistic is 3.83

209.3. Assume the frequency of internal fraud (an operational risk event type) occurrences per year is characterized by a Poisson distribution. Among a sample of 43 companies, the mean frequency is 11.0 with a sample standard deviation of 4.0. What is the 90% confidence interval of the population's mean frequency?

a) 10.0 to 12.0 b) 8.8 to 13.2 c) 7.5 to 14.5 d) Need more information (Poisson parameter)

Page 57: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

57

Answers: 209.1. B. No, the t-statistic is only 1.08. For a large sample, the distribution is normally approximated, such that at 5.0% two-tailed significance, we reject if the abs(t-statistic) exceeds 1.96. The standard error = SQRT(15%*85%/60) = 0.046098; please note: if you used SQRT(10%*90%/60) for the standard error, that is not wrong, but also would not change the conclusion as the t-statistic would be 1.29 The t statistic = (15%-10%)/0.046098 = 1.08; The two-sided p value is 27.8%, but as the t statistic is well below 2.0, we cannot confidently reject. We don't really need the lookup table or a calculator: the t-statistic tells us that the observed sample mean is only 1.08 standard deviations (standard errors) away from the hypothesized population mean. A two-tailed 90% confidence interval implies 1.64 standard errors, so this (72.8%) is much less confident than even 90%. 209.2. B. No, the t-statistic is 1.47 The standard error = 10%/SQRT(24) = 0.020412 The t statistic = (3.0% - 0%)/0.020412 = 1.47. The one-tailed critical t, at 95% with 23 df, is 1.71; two-tailed is 2.07. (even if we assume normal one-sided, the 95% critical Z is 1.645, of course.) 209.3. A. 10.0 to 12.0 The central limit theorem (CLT) says, if the sample is random (i.i.d.), the sampling distribution of the sample mean tends toward the normal REGARDLESS of the underlying distribution! The standard error = SQRT(4^2/43) = 4/SQRT(43) = 0.609994. The 90% confidence interval = 11.0 +/- 1.645*0.609994 = 11.0 +/- 1.0 = 10.0 to 12.0 ... did you realize that a 90% two-side confidence INTERVAL implies the same deviate (1.645) as a 95% one-sided deviate? Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-209-t-statistic-and-confidence-interval.5318/

Page 58: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

58

P1.T2.210. Hypothesis testing (Stock & Watson)

AIMs: Explain and apply the process of hypothesis testing: Define and interpret the null hypothesis and the alternative hypothesis; Distinguish between one‐sided and two‐sided hypotheses; Describe the confidence interval approach to hypothesis testing; Describe the test of significance approach to hypothesis testing.

210.1. Your colleague Robert wants to conduct a statistical test to determine whether hedge funds create alpha (i.e., excess return after attribution to all common factor exposure), on average. His test collects a large sample (n>30) and he is computing the mean (average) excess return such that the both the central limit theorem (CLT) and the law of large numbers apply. His null hypothesis is that, based on a sample of returns, the true (population) ex post realized alpha is approximately zero; therefore, the alternative hypothesis is that the true mean is non-zero and his two-tailed test allows for the possibility that funds destroy alpha via fees. He is going to test his hypothesis with a pre-specified significance level of 5.0%, per convention. In this case, each of the following is true EXCEPT for:

a) He can conduct the test without computing a p-value b) The probability of erroneously rejecting a true null hypothesis is 5.0% c) He can reject the null if the t-statistic is greater than 1.96 d) If he reduces the significance level to 1.0%, he reduces the probability of erroneously

accepting a false null

210.2. Analyst Jane is concerned that the average days sales outstanding (DSO) in her coverage sector has increased above its historical average of 27.0 days (a lower number is better). From a large sample of 36 companies, she computes a sample mean DSO of 29.0 days with sample standard deviation of 7.0. Her one-sided alternative hypothesis, stated with 95.0% confidence, is that DSO is greater than 27.0. Does she reject the null?

a) No, do not reject one-sided null as the t-statistic is less than 1.65 b) No, do not reject one-sided null as the t-statistic is less than 1.96 c) Yes, do reject one-sided null as the t-statistic is greater than 1.65 d) Yes, do reject one-sided null as the t-statistic is greater than 1.96

210.3. The average capital ratio of a sample of 49 banks is 7.4% with a sample standard deviation of 5.0%. What is the two-sided 95% confidence interval for the population's true average capital ratio; i.e., the random interval that has a 95% probability of containing the population mean?

a) 5.5% to 9.3% b) 6.0% to 8.8% c) 6.7% to 8.1% d) 7.1% to 7.7%

Page 59: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

59

Answers: 210.1. D. A reduction of 5% to 1% significance (becoming "more conservative") offers a trade-off: the probability of erroneously rejecting a true null (Type I error) decreases but the probability of erroneously accepting a false null (Type II error) increases.

In regard to (A), (B), and (C), each is true about a 5.0% significance test.

210.2. C. Yes, do reject one-sided null as the t-statistic is greater than 1.65 Standard error = 7/SQRT(36) = 1.16667; t statistic = (29-27)/1.16667 = 1.714. As 1.71 is greater than 1.645 (i.e., the critical value for a one-tailed 5% significance test of the large sample MEAN), she rejects the null in favor of the alternative but accepts, conditional on a true null, a 5.0% probability of making a Type I error. 210.3. B. 6.0% to 8.8% Standard error = 5.0%/SQRT(49) = 0.7143% Confidence interval = 7.4% +/ (0.7143% * 1.96) = 6.0% to 8.8% Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-210-hypothesis-testing-stock-watson.5336/

Page 60: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

60

P1.T2.211. Type I and II errors and p-value (Stock & Watson)

AIM: Define, calculate and interpret type I and type II errors. Define and interpret the p-value.

211.1. The Basel III market risk backtest requires a bank to observe the number of exceptions (aka, exceedences; the number of days on which the VaR loss was exceeded) in order to infer whether the bank's 99.0% 10-day value at risk (VaR) model is accurate. Even before the impossible task of prediction, the observed sample is making an inference about a historical population. The null hypothesis implicit in the Basel backtest is: H(0) = the VaR model is accurate with 99.0% confidence. Therefore, the alternative H(A) = VaR model is accurate with less than 99.0% confidence. Basel designed three "stoplight" zones to acknowledge the reality that sampling is a statistical test which is necessarily error-prone: A "Green Zone" outcome is when sufficiently few exceptions occur such that the decision should be to accept the model; a "Red Zone" outcome is when too many exceptions occur such that the decision is to reject the model as bad. Which of the following constitutes a Type II error?

a) A Red Zone outcome when the VaR model is 99.0% confident (i.e., gives 99.0% coverage)

b) A Green Zone outcome when the VaR model is, for example, only 97.0% confident c) A Green Zone outcome when the VaR model is 99.0% confident d) The Type II error cannot occur in the backtest

211.2. A sample of 144 has a sample mean of 112.9 with sample standard deviation of 60. The null hypothesis is that the true population mean is 100.0; the two-sided alternative hypothesis is that the true population mean is different than 100.0. What is the two-sided p-value?

a) 0.5% b) 1.0% c) 2.5% d) 5.0%

211.3. Your colleague Mary believes that FRM scores are correlated with work experience. Somehow she got hold of data and produced a linear regression, FRMScore = intercept + slope*YearExperience, such that the p-value of the slope coefficient is 1.7% (0.017). Per common practice, the significance test starts with a null hypothesis that the slope is zero, and the two-sided alternative hypothesis is that the slope is non-zero. Which of the following is a correct interpretation of the p-value?

a) If her prespecified significance level is 1.0%, she rejects the null and deems the true slope to be non-zero

b) If her prespecified significance level is 5.0%, she rejects the null and deems the true slope to be non-zero

c) The probability that the slope is zero is 1.7% d) The probability that the slope is non-zero is 98.3%

Page 61: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

61

Answers: 211.1. B. A Green Zone outcome when the VaR model is, for example, only 97.0% confident The null is: the 99.0% VaR model is accurate. A Type I error is to mistakenly reject a true null. A Type II error is to mistakenly accept a false null; in this case, to mistakenly decide the model is good (Green Zone) when the model is less than 99.0% confident. 211.2. B. 1.0% The standard error = 60/SQRT(144) = 60/12 = 5.0. The t-statistic = (112.9 - 100) / 5.0 = 2.58; This is the deviate that corresponds to a 1.0% probability under a two-tailed test; put another way, the area under the normal curve, to the left of +2.58 sigmas is 99.5%, such that the area under both left & right tails is 1.0%. 211.3. B. If her prespecified significance level is 5.0%, she rejects the null and deems the true slope to be non-zero The prespecified significance level is 5.0%, which implies a 5.0% rejection region (2.5% in each tail); i.e., if the null is true, 5.0% chance of mistakenly rejecting (Type I error). A p-value of 1.7% is the "exact significance level:" we can reject at higher significance levels (e.g., 5%) but we cannot reject at any lower significance levels (e.g., 1.0%) Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-211-type-i-and-ii-errors-and-p-value-stock-watson.5349/

Page 62: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

62

P1.T2.212. Difference between two means

AIM: Perform and interpret hypothesis tests for the difference between two means.

212.1. We want to decide whether the average arithmetic return of Fund A is better than the average return of Fund B: the null hypothesis is that the true average difference is zero. For both fund, our sample is 60 months. Over this sample, the average return of Fund A was 2.0% with a sample standard deviation of 3.0%; the average return of Fund B was only 1.0% with sample standard deviation of 2.0%. With 95% confidence, do we reject the null hypothesis (i.e., fail to accept) and decide that the average return of Fund A was truly better?

a) No, the t-statistic is only 0.465 b) No, the t-statistic is only 1.465 c) Yes, the t-statistic is 2.148 d) Yes, the t-statistic is 5.359

212.2. The average hourly earnings among a sample of 1,500 men is $22.00 with a sample standard deviation of $9.00. The average hourly earnings among a sample of 1,000 women is $20.00 with a sample standard deviation of $6.00. What is the 95% confidence interval for the (two-sided) difference in average earnings between men and women?

a) $0.04 to $3.96 b) $1.41 to $2.59 c) $1.70 to $2.30 d) $1.83 to $2.17

212.3. A credit rating agency wants to compare the difference in default rates between structured notes in two speculative rating categories: SF B versus SF CCC. The default rate among a sample of 1,800 SF B-rated obligors was 5.0%, compared to the default rate among a sample of 1,000 SF CCC-rated obligors was 8.0%. Default is characterized by a Bernoulli random variable. What is the 95% confidence interval for the difference in default rates?

a) 2.97% to 3.04% b) 2.11% to 3.89% c) 1.75% to 4.25% d) 1.04% to 4.96%

Page 63: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

63

Answers: 212.1. C. Yes, the t-statistic is 2.148; i.e., reject the null. SE [avg(A) - avg(B)] = SQRT(3%^2/60 + 2%^2/60) = 0.4655% t-statistic = [(2% - 1%) - 0]/0.4655% = 2.148 The one-sided critical value is 1.64 (note we would also reject the two-sided critical value of 1.96) 212.2. B. $1.41 to $2.59 SE [avg(men) - avg(women)] = SQRT(9^2/1,500 + 6^2/1,000) = 0.300; 95% CI = $2.00 +/- 1.96*0.30 = $1.41 to $2.59 212.3. D. 1.04% to 4.96% SE (difference in default rate) = SQRT(8%*92%/1,000 + 5%*95%/1,800) = 1.0% 95% CI = 3.0% +/- 1.96*1.0% = 1.04% to 4.96% Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-212-difference-between-two-means.5357/

Page 64: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

64

P1.T2.213. Sample variance, covariance and correlation (Stock & Watson)

AIMs: Define, calculate, and interpret the sample variance, sample standard deviation, and standard error. Define, describe, and interpret the sample covariance and correlation.

213.1. Consider the following five (X,Y) data points: (1,5), (2,4), (3,3), (4,2), (5,1). What is the sample standard deviation of (X)?

a) 0.97 b) 1.58 c) 2.00 d) 2.50

213.2. What is the sample covariance of the following five (X,Y) data points: (1,5), (2,4), (3,3), (4,2), (5,1).

a) -4.00 b) -2.50 c) -1.50 d) -1.00

213.3. Let Y(i) be the sample set of seven (n = 7) observations: 2, 3, 5, 6, 9, 11 and 13. What is the standard error of the sample average, SE(sample average Y)?

a) 1.56 b) 1.68 c) 4.12 d) 7.00

213.4. What is the sample correlation of the following five (X,Y) data points: (2,4), (3,1), (5,3), (7,7), (13,9)?

a) 0.511 b) 0.667 c) 0.744 d) 0.862

Page 65: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

65

Answers: 213.1. B. 1.58 The average of each (X) and (Y) is 3, such that: Standard Deviation (X) = Standard Deviation (Y) = SQRT([(1-3)^2 + (2-3)^2 + (3-3)^2 + (4-3)^2 + (5-3)^2]/4) = SQRT([(-2)^2+(-1)^2+0^2+1^2+2^2]/4) = SQRT(10/4) = 1.58114 ... note per a SAMPLE variance we divide by (n-1) or 4 213.2. B. -2.50 Sample covariance = [(1-3)(5-3) + (2-3)(4-3) + (3-3)(3-3) + (4-3)(2-3) + (5-3)(1-3)]/(5-1); Sample covariance = [-2*2 + (-1)*1 + 0*0 + 1*-1 + 2*-2]/4 = [-4 - 1 + 0 - 1 - 4]/4 = -10/4 = -2.50 213.3. A. 1.56 As the average = 49/7 = 7.0, the sample variance = [(2-7)^2 + (3-7)^2 + (5-7)^2 + (6-7)^2 + (9-7)^2 + (11-7)^2 + (13-7)^2]/(7-1) = 102/6 = 17.0; Sample standard deviation = SQRT(17) = 4.1231; Standard error (sample average Y) = SQRT(17/7) = 1.558387 213.4. D. 0.862 Sample Variance (X) = 19.0; Sample Variance (Y) = 10.2; Sample Covariance (X,Y) = 12.0; Sample correlation (X,Y) = Sample Covariance (X,Y) / (SQRT[Sample Variance (X)] * SQRT[Sample Variance (Y)]) = 12.0 / [SQRT(19)*SQRT(10.2)] = 0.86199 Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-213-sample-variance-covariance-and-correlation-stock-watson.5370/

Page 66: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

66

Statistics (Gujarati’s Essentials of Econometrics) P1.T2.57. Methodology of Econometrics P1.T2.58. Random variables P1.T2.59. Gujarati’s introduction to probabilities P1.T2.60. Bayes Theorem P1.T2.61. Statistical dependence P1.T2.62. Expectation & variance of variable P1.T2.63. Chebyshev’s Inequality P1.T2.64. Covariance of random variables P1.T2.65. Variance and conditional expectations P1.T2.66. Skew & Kurtosis P1.T2.67. Sample variance, covariance, skew, kurtosis P1.T2.68. Normal distribution P1.T2.69. Sampling distribution P1.T2.70. Standard error P1.T2.71. Central limit theorem (CLT) P1.T2.72. Student’s t distribution P1.T2.73. Chi-square distribution P1.T2.74. F-distribution P1.T2.75. Confidence interval P1.T2.76. Critical t-values P1.T2.77. Confidence interval P1.T2.78. Estimator properties P1.T2.79. Hypothesis testing P1.T2.80. Confidence intervals P1.T2.81. Type I, Type II & p value P1.T2.82. Chi-square and F-ratio

Page 67: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

67

P1.T2.57. Methodology of Econometrics

AIMs: Describe the methodology of econometrics. Distinguish between the different types of data used for empirical analysis. Describe the process of specifying, interpreting, and validity testing an econometric model.

57.1 From the perspective of data collection, an econometrician is most similar to a(n):

a) Mathematician b) Particle physicist c) Experimental research sociologist d) Meteorologist

57.2 Which of the following most nearly captures the essential difference between econometrics and mathematical economics?

a) Non-linear models b) Stochastic error (disturbance) term c) Slope coefficient d) Causation

57.3 Assume that our econometric model is: Y = B1 + B2*X + u, where B1 is intercept coefficient, B2 is slope coefficient, and (u) is disturbance/error term. Further, the slope is very significant and the fit is very good (high R^2). Which is the best econometric statement?

a) X causes Y (causation) b) X predicts Y c) X is economically significant d) None of these statements are strictly justified

57.4 In order to conduct an econometric study, Gujarati recommends the following eight (8) generic steps:

1) Statement of theory or hypothesis. 2) Obtaining the data 3) Specification of the mathematical model of the theory 4) Specification of the statistical, or econometric, model 5) Estimation of the parameters of the econometric model 6) Checking for model adequacy: Model specification testing 7) Hypothesis testing 8) Forecasting or prediction

At which step do generate the ordinary least squares (OLS) regression line; i.e., “run the regression”?

a) Step 3 b) Step 4 c) Step 5 d) Step 6

Page 68: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

68

57.5 At which step do we most endeavor to minimize the oft-dreaded MODEL RISK?

a) Step 5 b) Step 6 c) Step 7 d) Step 8

57.6 At which step are we mostly likely to use the p-value in an attempt to determine whether the population slope has a certain value?

a) Step 5 b) Step 6 c) Step 7 d) Step 8

Page 69: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

69

Answers: 57.1 D. (meteorologist) Gujarati on econometrics (emphasis mine): “Although mathematical statistics provides many tools used in the trade, the econometrician often needs special methods in view of the unique nature of most economic data, namely, that the data are not generated as the result of a controlled experiment. The econometrician, like the meteorologist, generally depends on data that cannot be controlled directly. As Spanos correctly observes: ‘In econometrics the modeler is often faced with observational as opposed to experimental data. This has two important implications for empirical modeling in econometrics. First, the modeler is required to master very different skills than those needed for analyzing experimental data. . . . Second, the separation of the data collector and the data analyst requires the modeler to familiarize himself/herself thoroughly with the nature and structure of data in question.’” … the inherently non-experimental nature of econometrics is meaningful because it implies that we expect imperfect data (e.g., mis-measured, omitted) and need tools in anticipation 57.2 B. (random error term; aka, random disturbance term) Gujarati explains the difference between: Labor Participation = intercept + slope * Unemployment Rate; i.e., a mathematical model and Labor Participation = intercept + slope * Unemployment Rate + Random error/disturbance (u); i.e., an econometric model Gujarati: “We let (u) represent all those forces (besides the independent variable) that affect [the dependent variable] but are not explicitly introduced in the model, as well as purely random forces. The error term distinguishes econometrics from purely mathematical economics.” 57.3 B. (X predicts Y) The regression model, of course, captures CORRELATION NOT CAUSATION. (the term “correlation” would make the question too easy!). As Gujarati says, regression does not imply causation. Further, it cannot! “our ideas of causation must come from outside statistics, ultimately from some theory or other.” However, we do indeed call the relationship PREDICTIVE. Gujarati: “It is better to call the relationship a predictive relationship” (emphasis his). And, please note, we often refer to the dependent as the “Predicted Y”. 57.4 C. Step 5 (Estimation of the parameters of the econometric model) 57.5 B. Step 6 (Checking for model adequacy: Model specification testing) … here we evaluate number of explanatory variables and specification errors (e.g., variable omission, wrong functional form) 57.6 C. Step 7 (Hypothesis testing) Please note this is not an entirely fair question: there is an argument for Steps 5, 6, or 7; e.g., you can argue the p-values are generated at Step 5. Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-57-methodology-of-econometrics.3601/

Page 70: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

70

P1.T2.58. Random variables Please note: I continue to write fresh questions according to the sequence of GARP’s AIMs. The current section begins Gujarati Chapters one through eight (1-8). Some of these questions are more difficult than you will find on the exam. And, some terms may be unfamiliar. But, in some cases, the idea is to “sneak in” additional FRM concepts. We have found this built-in cross-referencing of questions to be an effective use of your time – David

AIM: Define random variables, and distinguish between continuous and discrete random variables.

58.1. Which of the following is typically NOT treated as a random variable?

a) Drift component of geometric Brownian motion (GBM) b) Volatility component of GBM c) Probability of default (PD) d) Exogenous bid-ask spread in liquidity-adjusted value at risk (LVaR)

58.2 Each of the following random variables tends to be characterized by a CONTINUOUS random variable EXCEPT for (bonus if you can identify the most common applicable distribution!):

a) Operational loss severity b) Exceptions in a VaR backtest c) Loss given default (or recovery rate) d) Waiting time until next default

58.3 Each of the following random variables tends to be characterized by a DISCRETE random variable EXCEPT for (bonus if you can identify the most common applicable distribution!):

a) Operational loss frequency b) Defaults in a basket credit default swap (basket CDS) c) Bond Default d) Asset returns

Page 71: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

71

Answers: 58.1 A. GBM: dS(t)/S(t) = u*dt + sigma *dz where u = drift and sigma = volatility dW(t) is the stochastic (random) term such that Brownian motion, which underlies BSM, is symbolically: Log return = drift*time + volatility * SQRT(time) * randomized (stochastic) i.e., the process is random/stochastic due to the second term, but the drift is deterministic

In regard to (D), we can treat the bid-ask spread, as the liquidity risk input into LVaR, as either a constant or a random variable under the “exogenous spread approach”

58.2 B. VaR backtest applies binomial as losses EITHER exceed or do not exceed the VaR; a series of yes/no (Bernoulli) is a binomial.

In regard to (A), OpLoss severity has many forms but for the tail, extreme value theory (EVT) which contains POT is popular

In regard to (C), beta distribution is popular for LGD/recovery due to its flexibility

In regard to (D), “time” should betray a continuous idea. The exponential is used here. Interestingly, the discrete Poisson (e.g., number of losses) maps to the continuous exponential (time until the next loss).

58.3 D. (Asset returns are typically assumed to be continuous; e.g., lognormal)

In regard to (A), OpLoss frequency is often characterized by Poisson

In regard to (B), really basic is binomial (but assumes i.i.d., so probably need better)

In regard to (C), default is a Bernoulli.

Discrete variables can be counted (0, 1, 2 ...). Think coin flip, die roll Continuous variables must be measured. Think time, distance, asset returns Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-58-random-variables.3603/

Page 72: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

72

P1.T2.59. Gujarati’s introduction to probabilities

AIM: Define the probability of an event. Describe the relative frequency or empirical definition of probability. Describe and interpret the probability mass function, probability density function, and cumulative density function for a random variable. Distinguish between univariate and multivariate probability density functions.

59.1 If each outcome has an equal chance of occurring and the outcomes are mutually exclusive, the P(outcome A) = number of outcomes favorable to A / total number of outcomes. Which type of probability is this?

a) A priori b) A posterior c) Bayes Theorem d) Relevant frequency

59.2 If a bank’s 99% daily value at risk (VaR) is determined by simple historical simulation (HS), which probability is used?

a) Classical b) A priori c) Relative frequency (empirical) d) Parametric (analytical)

59.3 Consider the statement: “Our bank’s 99% daily VaR is $1 million.” This reflects which generic probability function?

a) PMF b) PDF c) CDF d) None of the above

59.4 Consider the statement: Each SINGLE ROW of a credit migration (transition) matrix is itself an empirical probability distribution.

a) True because the outcomes sum to 1.0 (100%) b) True because the outcomes are exclusive c) True because the probabilities are empirical d) True because all of the above are true

Page 73: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

73

59.5 Which of the following necessarily implies a multivariate probability distribution (while the others can imply a univariate probability distribution)?

a) Poisson to model frequency of an operational loss b) Copula to model default dependence in a CDO/basket CDO c) Binomial to model probability of defaults reaching mezzanine tranche in a basket CDS

where the credits are i.i.d. d) Exponential to model (waiting) time until default for a single credit given hazard rate

(a.k.a., default intensity)

59.6 Bayes Theorem says that P(A|B) is given by:

a) Conditional (A|B) / Marginal (B) b) Conditional (B|A) / Marginal (B) c) Joint P(AB) / Marginal (A) d) Joint P(AB) / Marginal (B)

Page 74: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

74

Answers: 59.1 A. (A priori) We can deduce the probability prior to any experience; e.g., in the case of rolling a die, or picking a card from a deck, we can imagine the odds without the need for running experiments and collecting observations 59.2 C. (empirical) Historical simulation calibrates confidence% VaR (e.g., 99%) based on the (1-confidence%; e.g., 1%) worst loss experienced in the historical sample. This is the essence of an empirical distribution.

In regard to (A) and (B), which are the same, this is after (posterior) data is observed, not before

In regard to (D), there is no parametric/statistical distribution assumed. This is the key ADVANTAGE of HS: it does not make an assumption about a (parametric) distribution and therefore, arguably, lends itself more easily to heavy tails.

59.3 C. (VaR is a CDF quantile) In this case, the statement is equivalent to “1% of the time, we expect to lose at least $1 million;” i.e., P[ ABS(loss) >= $1 million] =1% is the same as P[x loss <= -$1million] = 1%, which is a CDF 59.4 D. (all of the above are true) Each single row contains exclusive probabilities that a credit/obligor will end the period with a certain rating; the probabilities sum to 1.0 and are empirical. 59.5 B. (copula is multivariate) A copula is a function that “joins” marginal distributions together, using the function to incorporate the dependence, into a multivariate probability function. In regard to (C), the i.i.d. assumption is key to the binomial and enables the univariate distribution: If i.i.d. applies, the all defaults are characterized by the same, single variable, P[default] which is a Bernoulli. The collection of i.i.d Bernoullis (each with the same p = ?) is a binomial. 59.6 D. P(A|B) = P(B|A)P(A) / P(B) = P(AB)/P(B) = joint(AB)/marginal(B) Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-59-gujarati%E2%80%99s-introduction-to-probabilities.3606/

Page 75: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

75

P1.T2.60. Bayes Theorem

AIM: Define Bayes’ theorem and apply Bayes’ formula to determine the probability of an event.

60.1 A bank develops a new 99% confidence value at risk (VaR) model. Assume there is a 50% chance the model is good and a 50% chance the model is bad (bad = not good). A good 99% VaR model produces an exception (a loss in excess of VaR) 1% of the time. The bad VaR model will produce an exception 3% of the time. If we observe an exception, what is the probability the model is good?

a) 1.0% b) 25.0% c) 50.0% d) 66.7%

Page 76: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

76

Answer: 60.1 B (25%) Let P(G) = unconditional probability that model is good = 50% Let P’(G) = unconditional probability that model is bad = 50% Let P(E|G) probability of exception conditional on good model = 1%, such that Let P(E’|G) probability of no exception conditional on good model = 99% Let P(E|G’) probability of exception conditional on bad model = 3%, such that Let P(E’|G) probability of no exception conditional on bad model = 97% Bayes says the probability of good model conditional on observed exception is given by P(G|E) = P(GE)/P(E) = (50%*1%)/[(50%*1%)+(50%*3%)] = 25%

Cross-reference: Here are four (4) more Bayes’ Theorem practice questions: http://www.bionicturtle.com/forum/threads/question-35-probability-quantitative.2128 Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-60-bayes-theorem.3609/

Page 77: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

77

P1.T2.61. Statistical dependence

AIMs: Describe marginal and conditional probability functions. Explain the difference between statistical independence and statistical dependence.

61.1 It is popular to characterize loss given default (LGD) with a beta distribution due to its

flexibility. If LGD has a mean of 75% under the assumption of a beta PDF, which type of probability function is this?

a) Marginal b) Unconditional c) Conditional d) Joint

61.2 An analyst screens for stocks using a technical screen and a fundamental screen among the universe of 15,000 US publicly traded companies. The marginal (unconditional) probability of a stock meeting the technical screen is 10%; i.e., P[pass technical screen] = 10%. The probability of a stock meeting the fundamental screen conditional on meeting the technical screen is 30%; i.e., P [pass fundamental screen | passed the technical screen] = 20%. What is the JOINT probability that a stock passes both screens?

a) 1.0% b) 3.0% c) 12.0% d) 15.0%

61.3 Add the following to the above assumptions: The probability that a stock passes the fundamental screen conditional on failing the technical screen is 5.0%; i.e., P[pass fundamental screen | fail technical screen] = 5.0%. If we observe that a stock passed the fundamental screen, what is the posterior probability that the stock passed the technical screen?

a) 10% b) 20% c) 30% d) 40%

Page 78: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

78

61.4 If expected loss (EL) is the product of the probability of default (PD) and loss given default (LGD), what is the condition that must be satisfied in order for PD and LGD to be statistically independent (note that both PD and LGD are probability functions. PD is Bernoulli PMF and LGD may be use different distribution but also falls within [0,1])?

a) EL = PD*E(LGD) always b) EL = PD*E(LGD) at least some of the time c) EL = PD*E(LGD) + COV(PD,LGD) always d) EL = PD*E(LGD) + COV(PD,LGD) at least some of the time

61.5 Which is most accurate condition for the statistical independence of two variables (X) and (Y)?

a) Their correlation is zero: COV(X,Y) = 0 b) Their covariance is zero: rho(X,Y) =0 c) marginal P(X)*marginal P(Y) = marginal P(X)*P(Y|X) = marginal P(Y)*P(X|Y) d) P(X|Y) = Joint (X,Y)/marginal (X)

Page 79: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

79

Answers: 61.1 C. (conditional) LGD = E [loss | default]; i.e., expected loss conditional on a default … the beta PDF is not particularly relevant 61.2 B (3.0%) Joint (T,F) = marginal (T)*conditional (F|T) = marginal (F) * conditional (T|F). In this case, 10% marginal * 30% conditional = 3.0% joint 61.3 D (40%) P(T) = 10% and P(T’) = 90%; i.e., marginal or unconditional probabilities P[F|T] = 30% and P[F|T’] = 5%; i.e., conditional probabilities According to Bayes’ Theorem, P[T|F] = joint(T,F)/marginal(F) = 3%/(10%*30%+90%*5%) = 40.0% 61.4 A. Two variables (X) and (Y) are statistically independent if and only if their joint PMF/PDF is equal to the product of their marginal PMF/PDFs, for ALL COMBINATIONS of (X) and (Y) values. 61.5. C. Independence holds if and only if the product of marginals is equal to the joint probability. Both P(X)*P(Y|X) and marginal P(Y)*P(X|Y) are equivalent to Joint (X,Y).

In regard to (A) and (B), independence implies correlation() and covariance are zero. However, these are measures of linear dependence which is a narrow measure of dependence (e.g., copulas can handle non-linear dependencies) such that the CONVERSE is not true.

In regard to (D), this is simply a true statement regardless of independence

Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-61-statistical-dependence.3612/

Page 80: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

80

P1.T2.62. Expectation & variance of variable

AIMs: Define, calculate and interpret the expected value of a random variable. L1.T1.62 Define, calculate and interpret the variance of a random variable.

62.1 A random variable (r.v.) has the following PMF: f(X) = bX over the domain X = [0, 1, 2, 3, 4]; e.g., f(1) = b, f(2)= 2b. What is the expected value of X?

a) 2.0 b) 2.2 c) 2.5 d) 3.0

62.2 What is the variance of X?

a) 1.00 b) 1.33 c) 1.50 d) 2.00

62.3 Assume a $100 par bond has a probability of default (PD) of 10%, where default is characterized by a Bernoulli distribution, and if there is a default, no recovery is expected such that loss given default (LGD) is a non-random 100%. What is the standard deviation of, respectively, the loss of (i) the one bond and (ii) ten of the bonds under an i.i.d. assumption?

a) $10 and $100.00 b) $10 and $300.00 c) $30 and $300.00 d) $30 and $94.87

62.4 What is the standard deviation of the sum of a roll of, respectively, (i) 10 six-sided dice and (ii) 20 six-sided dice?

a) 2.92 and 29.2 b) 3.16 and 4.47 c) 5.4 and 7.6 d) 5.4 and 58.3

Page 81: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

81

Answers: 62.1 D. (EV = 3.0) Since r.v. is a PDF, the sum of f(x) must be 1.0 such that 0+b+2b+3b+4b=1.0. Therefore, b = 1/10 = 0.1. E[X] = 0 + 1*0.1 + 2*0.2 + 3*0.3 + 4*0.4 = 3.0; or 30/10 = 3.0 62.2 A. (Variance and standard deviation = 1.0) Variance(X) = E(X^2) - [E(X)]^2 E(X^2) = 0 + 0.10*1^2 + 0.20*2^2 + 0.30*3^2 + 0.40*4^2 = 10; such that Variance (X) = 10 - 3^2 = 1.0 62.3 D. $30 and $94.87 The standard deviation of the Bernoulli = SQRT[p(1-p)] = SQRT[10%*90%] = 0.30. In this case, $100 * 0.3 = $30.00 A series of i.i.d. Bernoulli variables is, by definition, a binomial. The standard deviation = $100 * SQRT[n*p*(1-p)] = $100* SQRT[10*10%*90%] = $100 * SQRT(0.9) = $100 * 0.949 = $94.87 … note per i.i.d., the variance scales by 10x, but the standard deviation scales by the SQRT(10) = 3.16 = $94.87/$30.00 62.4 C. 5.4 and 7.6 The point of this is to give further practice to the key formula: Variance(X) = E[X^2] - [E(X)]^2. The variance of single six-sided die = E[X^2] - [E(X)]^2 = 15.17 - 3.5^2 ~= 2.9167 The variance of n i.i.d. rolls = n*2.9167; Standard deviation of sum of 10 die = SQRT(10*2.9167) = 5.40; Standard deviation of sum of 20 die = SQRT(20*2.9167) = 7.64; Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-62-expectation-variance-of-variable.3614/

Page 82: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

82

P1.T2.64. Covariance of random variables

AIM: Define, calculate and interpret the covariance and correlation of two random variables.

64.1 A portfolio (P) has expected return volatility of 16% and a beta with respect to its benchmark of 1.20. The benchmark (B) has expected return volatility of 10%; i.e., volatility (P) =16% and volatility (B) =10%. If tracking error (TE) is the standard deviation of (P-B), what is the ex ante (expected) tracking error of the portfolio with respect to its benchmark?

a) 10.8% b) 12.6% c) 14.4% d) Not enough information

64.2 If we want to compute a variance of a portfolio’s rate of return, and the portfolio has ten positions, how many unique entries do we require in the covariance matrix?

a) 45 cells and a vector of position weights b) 45 cells and vector of position weights and a vector of volatilities c) 55 cells and a vector of position weights d) 55 cells and a vector of position weights and a vector of volatilities

64.3 A portfolio consists of two uncorrelated, equally-weighted positions. The first position (A) has volatility of 10%, the second position (B) has volatility of 30%. What is the beta of position B with respect to the two-asset portfolio (which itself contains position B)?

a) 0.67 b) 1.33 c) 1.6 d) 1.8

Page 83: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

83

Answers: 64.1. A VARIANCE (A-B) = VARIANCE(A) + VARIANCE(B) - 2*COV(A,B) In this case, TE(A,B) = SQRT(VARIANCE(P,B)) = SQRT(16%^2 + 10%^2 - 2*10%*16%*correlation), and since beta (P,B) = correlation (P,B) * vol(P)/vol(B), correlation(P,B) = beta(P,B)*vol(B)/vol(P) = 1.2*10%/16% = 0.75 TE(A,B) = SQRT(16%^2 + 10%^2 - 2*10%*16%*0.75) = 10.8% 64.2. C The covariance matrix is 10*10 with 10 cells in the diagonal and 45 in each triangle (10*11/2 - 10 = 45). If this were a correlation matrix, 1.0s would be in the diagonal and we only require 45 unique pairwise correlations. However, in the covariance matrix the diagonal contains variances: covariance (position, position) = variance(position). So there are 45 covariances + 10 variances = 55 unique cells. We do need the position weights but correlations are not required because they are implicit already in the covariance matrix; i.e., the covariance matrix is itself already the product of a correlation matrix and two volatility vectors. 64.3. D (beta of B with respect to portfolio = 1.80) Portfolio variance = w^2*variance(A) + w^2*variance(B) + (2)(w)(w)Covariance(A,B) =50%^2*10%^2 + 50%^2*30%^2 + (2)(50%)(50%)(covariance = 0) = 0.0250 beta (B, P) = Cov(B,P)/Var(P) = [Cov(B,0.5A + 0.5B)]/Var(P), and since correlation (A,B) = 0, 0.5Cov(A,B) = 0, such that = 0.5Cov(B,B)/Var(P) = 0.5Var(B)/Var(P) = 0.5*30%^2/0.025 = 1.8 is beta of asset B with respect to portfolio Please note the following covariance property: if X, Y, W, and V are random variables and a, b, c , d are constants, then Cov(aX + bY, cW + dV) = acCov(X,W) + adCov(X,V) + bcCov(Y,W) + bdCov(Y,V) http://en.wikipedia.org/wiki/Covariance Such that Cov(B,0.5A + 0.5B) = Cov(0A + 1B, 0.5A + 0.5B) = 0*0.5*Cov(A,A) + 0*0.5*Cov(A,B) + 1*0.5*Cov(B,A) + 1*0.5*Cov(B,B) = 0 + 0 + 0.5*Cov(B,A) + 0.5*Cov(B,B) = 0.5*Cov(B,A) + 0.5*Cov(B,B), and since Cov(B,B) = Variance(B): = 0.5*Cov(B,A) + 0.5*VaR(B). Further, if Correlation (A,B) = 0, then Cov(B,A) = 0, since Cov(B,A) = Correlation(A,B)*StdDev(A)*StdDev(B) Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-64-covariance-of-random-variables.3620/

Page 84: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

84

P1.T2.65. Variance and conditional expectations

AIMs: Define, calculate and interpret the mean and variance of a set of random variables. Describe the difference between conditional and unconditional expectation.

65.1 Assume a two-asset portfolio where the weight of the first asset is (w) and the weight of the second asset is (1-w). The first asset has return volatility of 10%, the second asset has return volatility of 20%, and their returns have a correlation of +0.25. Take the first derivative of the formula for portfolio variance to find the weight of the first asset that produces the minimum variance portfolio; i.e., the local minimum is where the first derivative with respect to (w) is equal to zero. What is the weight (w) that produces the minimum variance portfolio?

a) 50.0% b) 67.5% c) 75.0% d) 87.5%

65.2 Three of the following concepts imply a conditional expectation or probability. Which one is the exception and implies an unconditional expectation?

a) Expected shortfall b) GARCH(1,1) c) P(AB) / P (A | B) d) Hazard rate (a.k.a., default intensity)

Page 85: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

85

Answers: 65.1. D. 87.5% Portfolio variance (V) = 10%^2*w^2 + 20%^2*(1-w)^2 + 2*w*(1-w)*10%*20%*0.25, such that V = 0.01*w^2 + 0.04*(1-w)^2 + 0.01*w*(1-w), V = 0.01w^2 + 0.04 - 0.08w + 0.04w^2 + 0.01w - 0.01w^2, V = 0.04w^2 - 0.07w + 0.04. First derivative with respect to (w) gives: dV/dw = 0.08w - 0.07, and set that equal to zero for local minimum such that 0 = 0.08w - 0.07 and w = 7/8 = 0.875 or 87.5% And we can check: For w = 87.5%, portfolio volatility = SQRT (portfolio variance) = 9.683%, which is the minimum variance portfolio. 65.2 C. Conditional P (A | B) = joint P(AB) / unconditional P (B), such that: Unconditional P(B) = joint P(AB) / Conditional P (A|B)

In regard to (A), expected shortfall (aka, conditional tail loss) is a conditional: E (L | L > VaR).

In regard to (B), the “C” in GARCH(1,1) refers to conditional as this process modes a conditional variance.

In regard to (D), hazard rate is a conditional probability of default: P(D) | survival through previous periods.

Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-65-variance-and-conditional-expectations.3633/

Page 86: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

86

P1.T2.66. Skew & Kurtosis

AIMS: Define, calculate and interpret the skewness and kurtosis of a random variable. Describe and identify a platykurtic and leptokurtic distribution. Define the skewness and kurtosis of a normally distributed random variable.

66.1 Assume random variable (r.v.) has the following PMF:

What is the skewness of X?

a) 0.0 b) 0.2 c) 0.4 d) 0.6

66.2 What is the kurtosis of X?

a) 2.2 b) 2.6 c) 3.2 d) 3.8

66.3 Which of the following distributions is necessarily platykurtic?

a) Normal b) Empirical based on historical simulation c) Uniform distribution d) Student’s t

Page 87: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

87

66.4 Which of the following distributions is necessarily leptokurtic?

a) Normal b) Empirical based on historical simulation c) Uniform d) Student’s t

66.5 If the daily loss (L) is normally distributed such that L ~ N(mean = $20, variance = $16), what is the probability on a given day that the loss will exceed $30?

a) 0.62% b) 0.92% c) 1.32% d) 2.62%

Page 88: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

88

Answers:

66.1 D (skew = 0.6) Please note that because 4b + 3b + 2b + 1b + 0 = 1.0, 10b = 1.0 such that b = 0.10. Skew = Third moment /StdDev^3 = E[X - mu(X)]^3 / StdDev(X)^3 = 0.60 / 1^3 = 0.60 66.2 A (kurtosis = 2.2) Kurtosis = Fourth moment /StdDev^4 = E[X - mu(X)]^4 / StdDev(X)^4 =2.2 / 1^4 = 2.20 66.3 C. (Kurtosis of uniform = 1.8) In regard to (B), an empirical distribution can be either light- or heavy-tailed. 66.4 D (student’s t) Student’s t has excess kurtosis = 6/(d.f. - 4) so that as d.f. increases and it tends toward normal, the heavy-tails are tending toward normal but are always, even if slightly, heavy-tailed. (But the student’s T not going to give us meaningfully heavy tails. For meaningfully heavy-tails, we look to other distributions) 66.5 A (0.62%) Z = (30 - 20)/SQRT(16) = 2.5 standard deviations P (Z > 2.5) = 1 - NORM.S.DIST(2.5) = 0.62% Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-66-skew-kurtosis.3619/

Page 89: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

89

P1.T2.67. Sample variance, covariance, skew, kurtosis

AIMs: Distinguish between population and sample, and calculate the sample mean, variance, covariance, correlation, skewness, and kurtosis.

67.1 Harry computed a population standard deviation of four (sigma = 4) from a sample size of ten (n = 10), but realizes he should have computed a sample standard deviation. What is the sample standard deviation?

a) 3.78 b) 4.00 c) 4.22 d) 4.44

67.2 Assume the following set of (Y) and (X) variables which constitute a small sample (n = 5):

What is the sample covariance?

a) 1.60 b) 1.80 c) 2.00 d) 2.20

67.3 What is the sample correlation?

a) 0.50 b) 0.66 c) 0.75 d) 0.80

Page 90: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

90

67.4 What is the sample skew of (Y)?

a) -0.25 b) 0.00 c) +0.25 d) +0.35

67.5 What is the sample kurtosis of (Y)?

a) 1.60 b) 2.20 c) 3.00 d) 4.20

Page 91: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

91

Answers: Here is the spreadsheet.

67.1 C. (4.22) Let Y = sum of [(Xi - avg X)^2] for each Xi Population Std Dev = SQRT [Y / 10] and Sample Std Dev = SQRT [Y / 9]. In this case, 4 = SQRT [Y / 10], such that 4^2 * 10 = Y = 160. The sample variance is therefore given by SQRT [160/9] = 4.22 67.2 C. (2.0) Sum of cross product = 8.0 and 8.0 / (n-1) = 8/4 = 2.0 67.3 B. (0.66) Sample covariance = 2.0 Sample Std Dev (Y) ~= 1.92 Sample Std Dev (X) ~= 1.58 Sample correlation coefficient = sample covariance / [sample std dev (X) * sample std dev(Y)] = 2.0 / (1.92*1.58) ~= 0.66 67.4 D (+0.35) Sample Third moment = 2.52 Sample skew = sample third moment / sample variance^(3/2) = 0.35 67.5 A. (1.60) Sample Fourth moment = 21.84 Sample kurtosis = sample fourth moment / sample variance ^2 = 1.60 Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-67-sample-variance-covariance-skew-kurtosis.3636/

Page 92: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

92

P1.T2.68. Normal distribution

AIM: Describe the key properties of the normal distribution and the standard normal distribution.

68.1 Assume random variance (X) is normally distributed with mean of 10 and a variance of 25. Without using a calculator, what is the probability that X falls within 1.75 and 21.65; i.e., P (1.75 < X < 21.65)?

a) 68% b) 94% c) 95% d) 99%

68.2 Each of the following is a property of the normal distribution EXCEPT for:

a) Fully described by two parameters b) ~95% of the area under the curve lies +/- 3 standard deviations from the mean c) A linear combination of two (or more) normally distributed random variables is itself

normally distributed d) Skew = 0 and kurtosis = 3

Page 93: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

93

Answers: 68.1. B (94%) (1.75 - 10)/SQRT(25) = -1.65 (21.65 - 10)/SQRT(25) = 2.33 P ( Z < -1.65) = 5% and P (Z < 2.33) = 99%, Therefore, P (-1.65 < Z < 2.33) = 99% - 5% = 94% 68.2 B. ~68% of the area under the curve lies +/- 1 standard deviations from the mean ~95% of the area under the curve lies +/- 2 standard deviations from the mean ~99.7% of the area under the curve lies +/-3 standard deviations from the mean Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-68-normal-distribution.3640/

Page 94: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

94

P1.T2.69. Sampling distribution

AIM: Discuss the concept of random sampling and the sampling distribution of an estimator.

69.1 Which is the best description of a RANDOM SAMPLE of size (n)?

a) The mean of the sample is unknown but converges to the population mean b) The population PDF/PMF is unknown for each drawn c) There exists a PDF/PMF for each draw but there is no pattern among the X(i)s d) Each X(i) has the same PDF/PMF and is drawn independently

69.2 Which is a second moment of a sampling distribution?

a) Root unit b) Standard error c) Degrees of freedom d) Variance ratio

69.3 Each of the following is TRUE except for:

a) The population mean has a sampling distribution with variance = variance/n b) A sampling distribution is a probability distribution where the random variable is an

estimator c) The frequency distribution of sample means may be called an empirical sampling

distribution d) CLT asserts the sample mean is approximately normal regardless of population

distribution

Page 95: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

95

Answers: 69.1 D. A random sample draws i.i.d. random variables. There are two aspects to the i.i.d.: 1. independent; i.e., each drawn is independent of the next (no autocorrelation), and 2. identical; i.e., each X(i) has the same PDF/PMF (or CDF) probability distribution 69.2 B. (standard error) The standard error is the square root of the variance of the sampling distribution of the sample mean (i.e., the sample mean is an estimator with its own distribution). 69.3 A. (B, C, and D are true) The population has one “true” mean and variance; a sample consisting of the entire population is the only sample that will not produce sampling variation. Put another way, the population mean has zero variance. Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-69-sampling-distribution.3641/

Page 96: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

96

P1.T2.70. Standard error

AIM: Define and calculate the standard error of a sample mean.

70.1 What is sampling error?

a) sample variance b) sample variance / n c) sample standard deviation / SQRT(n) d) absolute difference of | sample mean - population mean|

70.2 Which most nearly describes the standard error of a sample mean?

a) Square root of the variance of an estimator b) Square root of the population variance c) Variance of the sample mean as an estimate d) Variance of the sample

70.3 Assume the population of hedge fund returns has an unknown distribution with mean of 8% and volatility of 10%. From a sample of 40 funds, what is the probability the sample mean return will exceed 10%?

a) 10.0% b) 10.3% c) 10.7% d) 11.1%

70.4 Assume instead we know the population of hedge fund returns is normal with mean of 8% but unknown volatility. What is the probability that the sample mean return (n = 40 is sample size) will exceed the 10%?

a) 10.0% b) 10.3% c) 10.7% d) 11.1%

Page 97: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

97

Answers: 70.1 D. Sample error is the (absolute) difference between the sample mean and the population mean. 70.2 A. The sample mean is an estimator of the population mean. With repeated draws, the sample mean is itself a random variable characterized by the sampling distribution of the sample mean. This sample distribution has a variance and a standard deviation (a.k.a., standard error). In this way, the standard error is just a standard deviation where the distribution happens to be the sampling distribution of the sample mean as an estimator of the population mean. 70.3. B. (10.3%) As this is a large sample (n>=30), CLT says the sample mean is approximately normal; i.e., ~N(population mean, variance = population variance/n) or ~ N(population mean, standard deviation = population standard deviation/SQRT(n)) In this case, sample mean ~ N(mean = 8%, variance = 10%^2/40) or ~N(mean=8%, standard deviation = 10%/SQRT(40)). The standard error is therefore = 10%/SQRT(40) = 1.58% And Z = (10%-8%)/1.58% = 1.26. The P (Z > 1.26) = 1 - NORMSDIST(1.26) = 10.3% 70.4. C. (10.7%) The difference is, this time we use a student’s t distribution where t = (10% - 8%)/ standard error = (10%-8%)/1.58% = 1.26. But now t = 1.26 and the degrees of freedom (d.f.) is 40 - 1 = 39 d.f. And the P (t > 1.26) with 39 d.f. is given by T.DIST.RG(1.26, 39) = 10.7% Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-70-standard-error.3650/

Page 98: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

98

P1.T2.71. Central limit theorem (CLT)

AIM: Describe the central limit theorem.

71.1 In order for the sample mean (estimator) to be approximately normal under the central limit theorem, which of the conditions is LEAST IMPORTANT?

a) Random variables drawn from identical distributions b) Random variables drawn independently c) Sample must be large d) Population has normal distribution

71.2 The average individual annual contribution to a 401(k) plan is $1,000 with a standard deviation of $800, but the distribution is unknown. What is the probability that the average contribution from a sample of 60 employees is less than $620?

a) Need to know the distribution b) 0.01% c) 0.05% d) 0.10%

71.3 A loan portfolio contains 5,000 loans, each with a probability of default of 2.0% (i.i.d). Using CLT to approximate, what is the probability that more than 120 loans will default?

a) 0.12% b) 1.08% c) 2.17% d) 4.16%

71.4 A bank conducts a four-year back-test of 1,000 daily losses (4 years * 250 trading days). Assume the bank employs an accurate 95% confident VaR model such that the bank expects the actual loss to exceed the VaR on 5% of the days. For this sample of 1,000 days, the bank therefore expects the daily loss to exceed the 95% VaR on 50 days; i.e., 5% * 1,000 equals 50 is the expected mean of the binomial distribution. According to the central limit theorem (CLT), the sample mean for large samples is approximately normal. If the back-test observes 60 losses that exceed VaR, how many standard normal deviations is this sample mean away from the mean; i.e., what is the Z statistic?

a) Z = 1.45 b) Z = 2.33 c) Z = 6.95 d) Z = 10.0

Page 99: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

99

Answers: 71.1 D. CLT says the sample mean of an i.i.d. random sample is approximately normal; i.e., tends toward normal, but this requires a large enough sample (n > 30). The remarkable aspect of CLT is that it does NOT require the underlying distribution to be normal. 71.2 B (0.01%) The CLT does not require us to know the distribution. The standard error = $800/SQRT(60) = $103.28 Z = (620 - 1000)/103.28 = -3.68 NORMSDIST (-3.68) = 0.01% 71.3 C (2.17%) The approximation given by CLT says Z = (120 - 100)/SQRT(2%*98%*5000) = 2.02 standard normal distributions; i.e., The mean is n*p = 5,000 * 2% = 100. The standard deviation = SQRT[p*(1-p)*n]. P (Z > 2.02) = 1 - NORMSDIST(2.02) = 2.17% As noted by Herve (see below), the precise answer is given by the binomial CDF: P (defaults > 120) = 1 - binomial CDF = 1 - P(defaults <= 120) = 1 - BINOM.DIST (120 defaults, 5000 credits, 2% PD, TRUE = CDF) = 2.156% ... and this demonstrates the truth of the CLT: the sample mean is 2.40% (120/5000) and the normal approximates (~) with 2.17% (2.1676%) which is pretty near to 2.156% which is the accurate probability given by the binomial. 71.4 A. (Z = 1.45) The variance of the binomial is given by p*(1-p)*T such that the standard error is given by SQRT[p*(1-p)*T]. In this case, the standard error = SQRT[5%*95%*1000] = 6.9. As a normal approximation under CLT, the Z = (60-50)/6.9 = 1.45 … as NORMSDIST(1.45) = 92.67%, the equates to a two-tailed p value of 14.68%. … as a backtest under (e.g.) 95% or 99% significance, we would not reject the null hypothesis that the model is correct. We would find that 60 exceptions could be simply due to sampling variation. Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-71-central-limit-theorem-clt.3654/

Page 100: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

100

P1.T2.72. Student’s t distribution

AIM: Describe the key properties of the t-distribution … and identify common occurrences of each distribution.

72.1. Among a sample of twenty (20) people, the average FICO credit score is 700 with a (sample) standard deviation of 20. Assume we know the population’s average FICO credit score is 690, but we do not know the population standard deviation. What is the (computed) test statistic?

a) t = -2.24 b) t = -2.18 c) Z = -2.24 d) Z = -2.18

72.2. Do we reject the null hypothesis that the true population average FICO score is 690 at, respectively, 95% and 99% confidence; i.e., Null H0: population average = 690 such that Alternative H1: population average <> 690?

a) No (@ 95%) and no (@ 99%) b) No and yes c) Yes and no d) Yes and Yes

72.3. The sample mean FICO score of 700 is an estimate (the sample mean formula, the “recipe,” is the estimator). What is the variance of this sample mean?

a) 0.88 b) 1.00 c) 1.12 d) 1.42

Page 101: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

101

Answers: 72.1. A. (t = -2.2361) The standard error = 20/SQRT(20) = 4.472. The test statistic is the number of standardized standard deviations that the sample mean is from the (hypothesized) population mean. In this case, the test statistic is given by: t = (690-700)/4.472 = -2.2361. Please note t = ABS(-2.2361) = 2.2361 is fine, too. And we used the student’s t distribution (i.e., a t test or t statistic) because we do not know the population variance/standard deviation. If we know the population variance, we would use the normal distribution. Further, as the sample size increases, the difference becomes insignificant such that for a large sample, normal would be an acceptable approximation. In summary, normal is appropriate if we know the population variance and an acceptable approximation if the sample is large. 72.2. C. (Yes and no) At 19 degrees of freedom, 95% two-tailed critical t value is 2.093 and 99% two-tailed critical t value is 2.861. (did you remember to subtract one for degrees of freedom; d.f. = n - 1? Did you note this is a two-tailed test?) At 95%, 2.24 > 2.093 so we reject the null At 99%, 2.24 is not > 2.861 so we cannot reject the null (imprecisely, we may say we “accept the null” but truly we fail to reject the null). In short, we are 95% confident the true pop mean is 690 but we are not 99% confident: 2.24 standard deviations could occur due to sampling variation with probability of 3.75% (the p value). As the p value is the exact significance level, we could reject the null with exactly 96.25% (1-p) confidence, less but not more. 72.3. C (1.12) Variance of student’s t = df/(df - 2) = 19/17 = 1.1176.

Note you could fairly guess as the student’s must have a variance greater than 1.0 (i.e., it’s always more disperse than the standard normal but not greatly so with sufficient df).

Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-72-students-t-distribution.3656/

Page 102: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

102

P1.T2.73. Chi-square distribution

AIM: Describe the key properties of the chi-square distribution… identify common occurrences of each distribution.

73.1 Each of the following are properties of the chi-square distribution EXCEPT for:

a) Non-zero (like lognormal) b) Tends to be normal as degrees of freedom (df) increase c) Variance is twice its mean d) Sum of two i.i.d. chi-square variables is a normal variable

73.2. Assume the true implied (population) volatility of a stock is 20.0%. Based on 20 price observations (n = 20), what is the NEAREST probability that a sample implied volatility, will be 25.2% or greater?

a) 1.0% b) 5.0% c) 10.0% d) 15.0%

73.3 Over the past 12 months, the monthly standard deviation of a fund was 1.5%. The fund claims that its long-term (population) standard deviation is only 1.0%. If the null hypothesis is that the true population standard deviation is equal to 1.0%, should we accept this claim with 99% confidence?

a) Yes, population standard deviation may be 1.0% as critical lookup chi-square is 24.72 b) Yes, population standard deviation may be 1.0% as critical lookup chi-square is 26.76 c) No, population standard deviation is likely NOT 1.0% as critical lookup chi-square is

24.72 d) No, population standard deviation is likely NOT 1.0% as critical lookup chi-square is

26.76

73.4 Use the same assumptions as above, but revise the null hypothesis to the following: the true monthly population standard deviation is less than or equal (<=) to 1.0%. Should we accept this claim with 99% confidence?

a) Yes, population standard deviation may be less than or equal to (<=) 1.0% as critical lookup chi-square is 24.72

b) Yes, population standard deviation may be less than or equal to (<=) 1.0% as critical lookup chi-square is 26.76

c) No, population standard deviation is likely great than (>) 1.0% as critical lookup chi-square is 24.72

d) No, population standard deviation is likely great than (>) 1.0% as critical lookup chi-square is 26.76

Page 103: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

103

Answers: 73.1 D. (A), (B) and (C) are true. The sum of i.i.d. chi-square variables is another chi-square variable with df equal to sum of their respective dfs. 73.2 B (4.97%) Chi-square computed value = 19*25.2%^2/20%^2 = 30.16 For 19 d.f., this is nearest to 0.05 (5%) lookup value of 30.14 (note this is a one-tailed test, so we use the probabilities as given.) Actual p-value =CHISQ.DIST.RT(30.16, 19 df) = 4.97%. 73.3 B. (lookup chi-square = 26.76 and accept null as computed value is within acceptance region) Chi-square value (the test statistic) with 11 df = 1.5%^2 * 11/1.0%^2 = 24.75 The null implies a two-tailed test; therefore, we want chi-square left-tail @ 99.5% and chi-square right-tail @ 0.5%. The critical chi-square value at 0.5% (0.005) is 26.7569. As 24.75 < 26.7569, we do not reject the null @ 99% and we conclude the claim may be correct. 73.4 C. (lookup chi-square = 24.72 and reject null as computed value is outside acceptance region) Chi-square value (the test statistic) with 11 df = 1.5%^2 * 11/1.0%^2 = 24.75 The null implies a one-tailed test; therefore, we want chi-square right-tail @ 1.0%. The critical chi-square value at 1.0% is 24.72. As 24.75 > 24.72, we reject the null @ 99% and we conclude the claim is likely incorrect. … note the two-tailed null become a stronger one-tailed null assertion, and we are now barely able to reject the weaker one-tailed alternative. Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-73-chi-square-distribution.3665/

Page 104: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

104

P1.T2.74. F-distribution

AIM: Describe the key properties of the F distribution ... and identify common occurrences of each distribution.

74.1 EACH of the following is a property of the F-distribution EXCEPT for:

a) Nonzero with positive skew b) If two sample variances are drawn from identical normal populations, the F ration will

equal zero c) The square of a student’s t variable with (k) degrees of freedom has an F distribution

with (1,k) degrees of freedom d) The F-distribution approaches the normal distribution as both degrees of freedom

increase

74.2 A proficiency exam is administered to two samples, each with 31 candidates. The sample of MBA degree holders (n=31) produces exam scores with a variance of 26. The sample of FRM certification holders (m=31) produces exam scores with a variance of 20. Can we claim that the (populations) variances of the two groups are truly DIFFERENT with, respectively, 95% and 99% confidence?

a) Yes (reject null @ 95%) and Yes (reject null @ 99%) b) Yes (reject null @ 95%) and No (“accept” null @ 99%) c) No (“accept” null @ 95%) and Yes (reject null @ 99%) d) No (“accept” null @ 95%) and No (“accept” null @ 99%)

Page 105: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

105

Answers: 74.1 B. The F ratio = higher sample variance/lower sample variance such that, if they are drawn from identical normal distributions, the ratio will equal one (1.0) not zero.

(A), (C), and (D) are each true about the F distribution.

74.2 D. (accept @ 95% and 99%) F ratio = 26^2/20^2 = 1.69 d.f. = 30 numerator = 30 denominator. At 5%, critical F = 1.84 = FINV(5%, 30,30) At 1%, critical F = 2.39 = FINV(1%, 30,30) The F ratio is within the acceptance region at both levels; i.e., we cannot reject the null hypothesis that the variances are drawn from identical populations. … the p value = 7.8% = 7.8% FDIST(1.69, 30, 30); i.e., we can reject null at 92.2% (1-p) or lower confidence but not higher confidence Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-74-f-distribution.3667/

Page 106: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

106

P1.T2.75. Confidence interval

AIM: Describe the concept of statistical inference, including estimation and hypothesis testing. Define and distinguish an estimator and a parameter. Define and distinguish between point estimate and interval estimation.

75.1 Which of the following LEAST resembles an estimator?

a) Average regression intercept of a sample of hedge funds (fund returns versus common factor returns) used to estimate alpha of long/short hedge fund strategy

b) Slope of regression of monthly security returns against excess market returns over previous five years is used to estimate CAPM beta

c) Average of implied volatility at various strike prices is used to estimate implied volatility of security

d) Annualized standard deviation of three year price history used to estimate current volatility

75.2 Assume a confidence interval is defined as follows: SAMPLE MEAN +/- MARGIN OF ERROR. Which of the following will DECREASE the margin of error?

a) Smaller sample size b) Greater confidence level c) Larger sample standard deviation d) More degrees of freedom

Page 107: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

107

Answers: 75.1 C. Alpha (intercept of sample regression function, SRF), beta (slope of SRF), and current volatility are each estimates (estimates are values produced by estimators) of the population’s true “unknowable” parameters. They will vary based on the sample selected; e.g., alpha based on the sample of funds among the population of long/short; beta based on the historical window selected (3 years? 5 years); volatility based on the window selected. But whereas we can re-sample the other estimates by varying (or drawing additional) samples, we can’t really re-sample the implied volatility. As Dowd says, “It is important to appreciate that implied volatility is not some backward-looking econometric volatility estimate, but a forward-looking forecast of volatility over the option’s term to maturity.” Unlike historical standard deviation, where sampling variation due to historical time frames produces different estimates (i.e., of the “true” current volatility), variation in implied volatility due to either (i) strike price or (ii) maturity presumes to reflect fundamental volatility smile/skew and term structure dynamics rather than sampling variation. 75.2 D. (more degrees of freedom—> lower critical-t—> decrease margin of error) Confidence interval = sample mean +/- margin of error, Confidence interval = sample mean +/- [critical-t * standard error], where standard error = SQRT[sample variance/n] Therefore:

Smaller sample size increases the standard error *and* increases the critical t and consequently increases the margin of error. Greater confidence level increases the critical t and consequently increases the margin of error.

Larger sample standard deviation increases the standard error and so increases the margin of error.

Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-75-confidence-interval.3673/

Page 108: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

108

P1.T2.76. Critical t-values

AIM: Define and interpret critical t-values.

76.1 Among a sample of 26 hedge funds selected by a fund of hedge fund (FoHF) manager, the average alpha is 3.35% with a sample standard deviation of 10%. What is the probability that the average hedge fund adds positive alpha; i.e., probability that alpha is greater than zero?

a) 5% b) 68% c) 90% d) 95%

76.2 Assume the 95% relative value at risk (VaR) for a position is 32.90% because the volatility is 20% and 20% * 1.645 = 32.09% and we assume a normal distribution (note that a 1.645 deviate at 95% already betrays the normal assumption). If we want to instead treat the volatility as a SAMPLE VOLATILITY based on a sample size of twenty (n=20) , what is the analogous 95% relative VaR under the assumption of a student’s t distribution?

a) 32.90% b) 33.20% c) 34.58% d) 35.62%

76.3 Which most nearly describes a COMPUTED Student’s t statistic (test statistic)?

a) Number of standard errors separating an observed sample mean from hypothetical mean

b) Distance from sample mean to hypothetical mean c) Standard error if the assumption is a student’s t distribution d) The probability of obtaining a sample mean greater than the critical t value.

76.4 Which of the following will INCREASE the critical t value, all other things being the same?

a) Increase sample b) Increase degrees of freedom c) Increase confidence level d) Increase the significance level

Page 109: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

109

Answers: 6.1 D 95.0% The test t-statistic = (3.35% - 0 null)/[10%/SQRT(26)] = 1.708 If we look at the row where d.f. = 25, then 1.708 corresponds to the column at 5% (one tail)/10% (two-tail). Since this is a one-tail test, the p-value is 5.0%. … put simply, 3.35% is ~1.71 standard errors (standard deviations) from 0% (null hypothesis). Outcomes of 1.71 or greater are 5.0% likely due to random sampling variation. This p-value of 5% (i.e., exact significance level) corresponds to a 95% confidence level: we can be 95% confident that +1.71 standard errors (from zero) is statistically significant (i.e., not due to random sampling variation) 76.2 C. 34.58% The student’s t one-tailed deviate, with 19 degrees of freedom, is 1.729. 20%*1.729 = 34.58% 76.3 A. Computed t or test statistic = (sample mean - null hypothesis mean) / [sample standard deviation/SQRT(n)] = (sample mean - null hypothesis mean) / standard error. In other words, this simply takes the difference between the observed sample mean and the null and converts to standard units. 76.4 C.

In regard to (A) and (B), both decrease the critical t value (as it converges toward the normal distribution).

Similarly, in regard to (D), an increase in the significance level corresponds to a decrease in confidence level, and this decreases the critical t value as we can more easily reject the null (i.e., lower critical t = smaller acceptance region = larger rejection region) However, increasing the confidence level increases the critical t value; this is akin to moving TO THE RIGHT (lower significance = higher confidence) for any given row on the table.

Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-76-critical-t-values.3680/

Page 110: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

110

P1.T2.77. Confidence interval

AIM: Define, calculate and interpret a confidence interval.

77.1 If we have generated a 95% confidence interval (CI) bounded by a lower (L) and upper (U) limits in order to test a null hypothesis about the population variance, using critical chi-square values, which of the following best describes our CI?

a) There is a 95% probability that the random true mean is contained by our fixed interval b) There is a 95% probability that our random interval contains the true mean c) There is a 95% probability that the random true variance is contained by our fixed

interval d) There is a 95% probability that our random interval contains the true variance

77.2 Assume the 99% confidence interval for a population mean (mu) is given by: 63 < mu < 137. The sample size is 10. What is the sample standard deviation?

a) 5 b) 6 c) 25 d) 36

77.3 Assume the 99% confidence interval (CI) for a population mean, where the population variance is unknown, is bounded by a lower limit L(n) and an upper limit U(n). Let D(n) be the length of the CI for sample size n, such that D(n) = U(n) - L(n). If we double the sample size (i.e., from n to 2n), let the length of the “new” CI be given by D(2n) = U(2n) - L (2n). What is the ratio D(2n)/D(n)?

a) Less than 1/SQRT(2) b) 1/SQRT(2) = 0.707 c) SQRT(2) = 1.414 d) More than SQRT(2)

Page 111: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

111

Answers: 77.1 D. Neither (A) nor (B) can be true because we are using the chi-square distribution for the test of a sample variance. Student’s t is used for test of sample mean; chi-square is used for test of sample variance. (C) in incorrect because the true variance is not random; rather the interval is random. 77.2 D. (36) 63 = 100 - critical t * sample standard deviation/SQRT(n), and 137 = 100 + critical t * sample standard deviation/SQRT(n). i.e., since this is test of sample mean, student’s t applies and distribution is symmetrical so we know the sample mean is the midpoint of [63,137]. But distribution isn’t always symmetrical; e.g., lognormal or chi-square distributions have skew. For 9 df, two-tailed critical t @ 99% is 3.250, such that 137 = 100 + 3.250 * S/SQRT(10), and 37 = 3.25 * S/SQRT(10), 37*SQRT(10)/3.25 = 36 77.3 A. Less than 1/SQRT(2) Please note that (C) and (D) can be straightaway dismissed: increasing the sample reduces the acceptance region. Because the margin of error = critical t value * sample variance/SQRT(n), the margin of error scales by 1/SQRT(n) if the critical t value is constant. Put another way, if the confidence interval is constructed using a normal distribution, then the critical Z value would be unaffected by sample size and 1/SQRT(2) = 0.707 will be the ratio; i.e., as the sample size doubles, the interval multiplies by 0.707. However, in this case, the critical value ALSO DECREASES with the more than doubled d.f. (e.g., n from 10 to 20 implies d.f. from 9 to 19). So, the net result is a ratio that is LESS THAN 1/SQRT(2). Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-77-confidence-interval.3683/

Page 112: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

112

P1.T2.78. Estimator properties

AIMs: Describe the properties of point estimators. Distinguish between unbiased and biased estimators. Define an efficient estimator and consistent estimator.

78.1 To which property does the following refer: “The expected, or average, value of the estimator is the population parameter”?

a) Unbiased b) Minimum variance c) Efficiency d) Consistency

78.2 To which property does the following refer: “An estimator that is both unbiased and has the smallest variance”?

a) Unbiased b) Minimum variance c) Efficiency d) Consistency

78.3 To which property does the following refer: “As the sample size increases, the estimator approaches the true parameter value”?

a) Unbiased b) Minimum variance c) Efficiency d) Consistency

78.4 Which of the following is an UNBIASED estimator of the population variance?

a) Sum of [observed X(i)^2]/n b) Sum of [observed X(i)^2]/(n-1) c) Sum of [observed X(i)- average X]^2 / n d) Sum of [observed X(i)- average X]^2 / (n-1)

Page 113: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

113

Answers: 78.1 A. (unbiased) For example, the EXPECTED value of the sample mean (i.e., if we draw repeated samples, and compute the mean for each sample, and observe the mean of these estimates) is the population mean. 78.2 C (efficient) An efficient estimator is the estimator, among the unbiased estimators, that has the minimum variance (i.e., efficient = unbiased + minimum variance). 78.3 D. (consistency) 78.4 D. Sum of [observed X(i)- average X]^2 / (n-1) What we call the sample variance is an unbiased estimator of the population variance. In regard to (A), per Hull, when using daily returns we make two simplifying assumptions by (i) assuming the average X is zero because the period is short, and (ii) replacing the technically correct (n-1) with (n). In this way, variance as the average squared return is a convenience or near approximation to the technically correct unbiased estimator. Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-78-estimator-properties.3687/

Page 114: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

114

P1.T2.79. Hypothesis testing

AIMS: Explain and apply the process of hypothesis testing. Define and interpret the null hypothesis and the alternative hypothesis. Distinguish between one-sided and two-sided hypotheses.

79.1 Assume a bank conducts a four-year (1,000 days) backtest of their 99% value at risk (VaR) model. The backtest is conducted at a 95% level of test confidence, such that the model is accepted (“accurate VaR model”) if the number of exceptions fall within a range of 4 and 17. What is the ALTERNATIVE hypothesis?

a) Losses exceed VaR with probability = 1.0% b) Losses exceed VaR with probability = 5.0% c) Losses exceed VaR with probability <> 1.0% d) Losses exceed VaR with probability <> 5.0%

79.2 If the average price-earnings (P/E) of 20 NYSE companies is 18.0 with sample standard deviation of 9.0, what is the 95% confidence interval for the “true” population average P/E ratio?

a) 12.1 < mu < 24.6 b) 13.9 < mu < 22.2 c) 16.4 < mu < 25.7 d) 19.6 < mu < 26.9

79.3 Assume we conduct a 95% significance test of a small (<30) sample mean, using the sample standard deviation. Which of the following will DECREASE the probability that we incorrectly reject a true null hypothesis?

a) Increase sample size b) Realize a mistake in calculations, and use a smaller sample standard deviation c) Realize our sample is actually 30, and decide to use a normal instead of student’s t d) None of the above

Page 115: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

115

Answers: 79.1 C. Two-tailed Null: Losses exceed VaR with probability = 1.0% Two-tailed Alternative: Losses exceed VaR with probability <> 1.0% … the 99% VaR is one-tailed, but the significance test is (in this case, given a confidence interval) two-tailed 79.2 B. 18 +/- 2.093*9/SQRT(20) = 13.8 < mu < 22.2 … two-tailed critical t value @ 95% and 19 df is 2.093 79.3 D. None of the above. The 95% confidence = 5% significance. 5% refers to the rejection region and is the probability of a Type I error: we have a 5% chance of rejecting a true null. In regard to (A), for example, increasing sample size with decrease the margin of error (two effects: 1. critical value, 2. SQRT(n) denominator) and tighten the confidence interval, for the same 5% significance level. Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-79-hypothesis-testing.3695/

Page 116: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

116

P1.T2.80. Confidence intervals

AIMs: Describe the confidence interval approach to hypothesis testing. Describe the test of significance approach to hypothesis testing.

80.1. The average tax refund among a (small) sample size of 18 is $1,000. The alternative hypothesis is that the population average refund is greater than $882; i.e., Ha: mu > 882. We are told the one-tailed probability value (“p value”) is 5.0%. What is the 99% (two-tailed) confidence interval for the “true” population average tax refund?

a) $803 < mu < $1,197 b) $826 < mu < $1,174 c) $839 < mu < $1,161 d) $849 < mu < $1,151

80.2 A two-variable OLS regression (Guarati’s “two variable” refers to one explanatory variable and one explained variable and could also be called “univariate” due to the single explanatory variable) is based on only nine datapoints (n = 9) and is given by: Y = 2.1 + 0.9X where the standard error (se) of the slope coefficient is 0.5. Is the slope significant with 95% confidence; i.e., is the slope coefficient significantly different than zero with 95% confidence?

a) no, because t statistic is only 0.2 b) no, because t statistic is only 1.8 c) yes, because t statistic is only 0.2 d) yes, because t statistic is only 1.8

80.3 In the situation above, what is a 99% confidence interval for the “true” (population) slope coefficient?

a) +0.12 < true slope < 1.68 b) -0.18 < true slope < 1.98 c) -0.48 < true slope < 2.28 d) -0.78 < true slope < 2.58

Page 117: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

117

Answers: 80.1. A. For 17 d.f., the one-tailed critical t value = 1.740; i.e., one-tailed per the alternative. Since 5% is the p value, we can infer: ABS(sample - null)/standard error = 1.740; i.e., the calculated t value equals the critical t at 5%, is how we can interpret the 5% p value. In this case, (1000 - 882)/Standard Error = 1.740, such that Standard error ~= 67.82. And since the standard error = sample standard deviation / SQRT(n), Sample standard deviation = 67.82*SQRT(17) = 279.63 At 99% confidence, the two-tailed critical t for 17 df is 2.898 The 99% LB of CI = 1000 - 2.898 * 279.63/SQRT(17) = ~803 The 99% UB of CI = 1000 + 2.898 * 279.63/SQRT(17) ~= 1197, and Notes: d.f. = n - 1 for univariate test of sample mean. Standard error = sample standard deviation / SQRT(n) per CLT. Margin of error = critical t (lookup) value * standard error CI = sample mean +/- margin of error 80.2 B. The null is H0: slope = 0. Test statistic = (0.9 observed - 0 null)/0.5 = 1.8 At 8 df, two-tailed critical t (lookup t) at 95% confidence is 2.306. Since 1.8 < 2.306, we cannot reject the null and we do not find the slope significant. Put another way, the observed slope is only 1.8 standardized standard deviations from zero, and as this is less than 2.306, this could merely be due to sampling variation. 80.3 D. LB of CI = 0.90 - 3.355 * 0.5 = -0.78 UB of CI = 0.90 + 3.355 * 0.5 = 2.58 Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-80-confidence-intervals.3698/

Page 118: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

118

P1.T2.81. Type I, Type II & p value

AIMs: Define, calculate and interpret type I and type II errors. Define and interpret the p value.

81.1. Assume the sample average price-to-earnings (P/E) ratio of 18 NYSE companies is 20.0 with a standard deviation of 8.0. The null hypothesis is that the true (population) mean P/E is equal to 15.0; i.e., Null H0: mu = 15. If we decide that we want the probability of committing a Type I error to be only 1.0%, what do we decide?

a) Reject null because computed test (t) statistic is 2.652 b) Reject null because computed test (t) statistic is 2.898 c) Accept (fail to reject) null because computed test (t) statistic is 2.652 d) Accept (fail to reject) null because computed test (t) statistic is 2.898

81.2 The p-value for a two-tailed test of sample mean is 1.68%. Which of the following is true?

a) We can reject the null with 95% confidence b) We can reject the null with 99% confidence c) The probability of committing a Type II error is 98.168% d) The probability of committing a Type II error is 1.68%

81.3 If the compute test (t) statistic for a sample mean is very low (e.g., significantly less than 2.0) with sufficient sample size, what type of error might we make?

a) Type I b) Type II c) Either Type I or Type II d) Neither Type I or Type II

81.4 Assume the same scenario at 81.1 above: n = 18; sample mean = 20.0; sample standard deviation = 8.0; null hypothesis (H0) that population mean = 15.0; and desired confidence level of 99.0%. EACH of the following changes, ceteris paribus, will DECREASE the p value, EXCEPT for:

a) Increase sample size b) Increase desired confidence level from 99.0% to 99.9% c) Decrease sample standard deviation d) Decrease population mean null hypothesis

Page 119: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

119

Answers: 81.1. C. (Accept) Test (t) statistic = ABS(20 - 15)/(8/SQRT(18)) = 2.652. Two-tailed critical (lookup) t value @ 99% = 99% 2.898. Because 2.652 < 2.898, fail to reject null; i.e., falls within acceptance region. Note: the significance level = probability of committing a Type I error 81.2 A. The p value is the exact significance level such that it is also the probability of committing a Type I error. Or, we can reject the null with (1- p value) confidence. In this case, we can reject the two-tailed null with 98.32% confidence, at the most! Therefore, we can also reject at 95% but not at 99%. 81.3 B. If the test statistic is low, we will accept (fail to reject) the null; i.e., about 2.x is the sweet spot. If we accept the null, either we make a correct decision (accept a true null) or an incorrect decision (accept a false null). Therefore, the only error conditional on having accepted the null is a Type II error; at this point, we cannot have made a Type I or have made both. (and we can never be certain). 81.4 B. Changing the desired confidence level, by itself, has no impact on the p value. The others (increase sample, decrease sample StdDev, and decrease population mean null hypothesis) each will decrease the p value. Discuss in forum here: https://www.bionicturtle.com/forum/threads/question-81-critical-t-value.2678/

Page 120: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

120

P1.T2.82. Chi-square and F-ratio

AIM: Describe and interpret the chi-squared test of significance and the F-test of significance.

82.1 A fund claims volatility of 30%. From a sample of 15 observations (n=15), we compute sample volatility of 40%. What is the 95% (two-tailed) confidence interval for the “true” population volatility?

a) 26.3% < sigma < 66.1% b) 29.3% < sigma < 63.1% c) 32.3% < sigma < 60.1% d) 35.3% < sigma > 57.1%

82.2 Using the sample data as above, change the null hypothesis to “the true population volatility is less than or equal to 30%.” As such, the alternative hypothesis is “the true volatility is greater than 30%.” In short, H0: sigma^2 <=30%^2 and HA: sigma^2 > 30%^2. Do we reject the claim that the fund’s true volatility is LESS THAN OR EQUAL TO 30%?

a) No, because the computed test statistic is 24.89 b) No, because the critical (lookup) chi-square value is 23.6848 c) Yes, because the computed test statistic is 24.89 d) Yes, because the critical (lookup) chi-square value is 26.119

82.3 Assume returns for a mutual fund strategy are normally distributed. We compare the volatility of two different funds. For the first fund, a sample of 25 returns produced a sample volatility of 44.0%; n = 25, S(n) = 44%. For the second fund, a sample of 16 returns produced a sample volatility of 30.0%; m = 16, S(m) = 30%^2. Are the volatilities significantly different, respectively, at 95% and 99% confidence; i.e., two-tailed test?

a) No (@95%) and No (@99%) b) No (@95%) and Yes (@99%) c) Yes (@95%) and No (@99%) d) Yes (@95%) and Yes (@99%)

Page 121: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

121

Answers: 82.1. B. ( 29.3% < sigma < 63.1% ) Chi-square value (2.5%, 14 df) = 26.119 Chi-square value (97.5%, 14 df) = 5.6287 Lower bound: 14*40%^2/26.119 = 0.0858 variance Upper bound: 14*40%^2/5.6287 = 0.398 variance CI = 29.3% < pop variance < 63.1% ... we can accept (fail to reject) their claim that true volatility is 30%. 82.2 C. (Yes, we reject the null because test statistic of 24.89 > critical value of 23.685) Test statistic = (n-1) * sample variance / population variance = df * sample variance / population variance. In this case, 14 * 40%^2/30%^2 = 24.8889 Because this is a one-tailed test, we want the critical chi-square @ 5%, which is 23.685

Note: At 95%, the two-tailed fails to reject (only 2.5% in each tail) but the one-tail rejects the null.

82.3. A. (no and no) The F test statistic = 44%^2/30%^2 = 2.1511. At 95% and df = 24,15 the critical (lookup) F value is 2.29. At 99% and df = 24,15 the critical (lookup) F value is 3.29. In both cases, the computed F value (2.1511) is less than the critical value (2.29 or 3.29), so we fail to reject the null. We cannot say the variances/volatilities are significantly different. … the p value = 6.4%, so for example, we could reject the null with (1-6.4%) confidence or LESS, but not more. Discuss in forum here: https://www.bionicturtle.com/forum/threads/l1-t2-82-chi-square-and-f-ratio.3703/

Page 122: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

122

Distributions (Miller Chapter 4) P1.T2.309. Probability Distributions I, Miller Chapter 4 P1.T2.310. Probability Distributions II, Miller Chapter 4 P1.T2.311. Probability Distributions III, Miller P1.T2.312. Mixture distributions

P1.T2.309. Probability Distributions I, Miller Chapter 4

AIMs: Describe the key properties of the uniform distribution, Bernoulli distribution, Binomial distribution ...

309.1. Next month, the short interest rate will be either 200 basis points with probability of 28.0%, or 300 basis points. What is nearest to the implied rate volatility?

a) 17.30 bps b) 44.90 bps c) 83.50 bps d) 117.70 bps

309.2. At the start of the year, a stock price is $100.00. A twelve-step binomial model describes the stock price evolution such that each month the extremely volatility price will either jump from S(t) to S(t)*u with 60.0% probability or down to S(t)*d with 40.0% probability. The up jump (u) = 1.1 and the down jump (d) = 1/1.1; note these (u) and (d) parameters correspond to an annual volatility of about 33% as exp[33%*SQRT(1/12)] ~= 1.10. At the end of the year, which is nearest to the probability that the stock price will be exactly $121.00?

a) 0.33% b) 3.49% c) 12.25% d) 22.70%

309.3. Assume a bank's 95.0% value at risk (VaR) model is perfectly accurate. If daily losses are independent, what is the probability that the number of daily losses exceeds the VaR on exactly five days out of the previous 100 trading days?

a) 9.24% b) 12.39% c) 18.00% d) 43.74%

Page 123: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

123

Answers: 309.1. B. 44.90 bps Expected rate = 28%*200 + 72%*300 = 272, and Variance = (200-272)^2*28% + (300-272)^2*72% = 2,016.0 bps^2, such that Standard deviation = SQRT(2,016) = 44.90 basis points. 309.2. D. 22.70% There are 13 outcomes at the end of the 12-step binomial, with $100 as the outcome that must correspond to six up jumps and six down jumps. Therefore, $121.0 must be the outcome due to seven up jumps and five down jumps: $100*1.1^7*(1/1.1)^5 = $121.00 Such that we want the binomial probability given by: Binomial Prob [X = 7 | n = 12, p = 60%] = 22.70%.

309.3. C. 18.00% Binomial Prob [X = 5 | n = 100, p = 5.0%] = 18.00% Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-309-probability-distributions-i-miller-chapter-4.7025/

Page 124: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

124

P1.T2.310. Probability Distributions II, Miller Chapter 4

AIM: Describe the key properties of ... the Poisson distribution, normal distribution ... and identify common occurrences of each distribution.

310.1. A large bond portfolio contains 100 obligors. The average default rate is 4.0%. Analyst Joe assumes defaults follow a Poisson distribution but his colleague Mary assumes the defaults instead follow a binomial distribution. If they each compute the probability of exactly four (4) defaults, which is nearest to the difference between their computed probabilities?

a) 0.40% b) 1.83% c) 3.55% d) 7.06%

310.2. Assume the annual returns of Fund A are normally distributed with a mean and standard deviation of 10%. The annual returns of Fund B are also normally distributed, but with a mean and standard deviation of 20%. The correlation between the returns of the funds is +0.40. At the end of the year, Fund B has returned +30%, and Fund A has returned 12%. Which is nearest to the probability that Fund B outperforms Fund A by this much or more? (This is a variation on Miller 2.2. You will need to perform a z-lookup).

a) 7.00% b) 15.90% c) 33.22% d) 56.04%

310.3. Assume that X(1), X(2) and X(3) are three series (vectors) of 100 independent standard random normal variables; for example, each can be generated in Excel with =NORM.S.INV(RAND()). Assume also a correlation parameter, which is constant (rho). We perform the following translations on the vectors in order to produce the seven additional series, X(A) to X(G):

X(A) = SQRT(rho)*X(1) + SQRT(1-rho)*X(2) and X(B) = SQRT(rho)*X(1) + SQRT(1-rho)*X(3)

X(C) = X(1) and X(D) = rho*X(C) + SQRT(1-rho^2)*X(2)

X(E) = 3.0 + 5.0*X(1)

X(F) = X(1)*X(2)

X(G) = X(1)/X(2)

Which of the following statements is TRUE?

a) We should expect the sample correlation of X(A) with X(B), and X(C) with X(D), to be exactly equal to rho

b) We can expect the sample correlation of X(A) with X(B), and X(C) with X(D), to be approximately equal to rho

c) X(E) is no longer normal d) X(F) and X(G) are approximately normal

Page 125: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

125

Answers: 310.1. A. 0.40% Binomial Prob [X = 4 | n = 100 and p = 4%] = 19.939%, and Poisson Prob [X = 4 | lamda = 100*4%] = 19.537%, such that difference = 0.4022%. 310.2. C. 33.22% E[B-A] = +10%, Standard Deviation[B-A] = SQRT(20%^2 + 10%^2 - 2*20%*10%*0.4) = 18.44% Z = (18% - 10%)/18.44% = 0.4339, such that Prob [Z > 0.4339] = 1 - NORM.S.DIST(0.4339) = 1 - 66.78% = 33.22% 310.3. B. We can expect the sample correlation of X(A) with X(B), and X(C) with X(D), to be approximately equal to rho. Both transformations are valid in transforming the independent series to correlated series; however, due to sampling variation, we expect the (actual) sample correlation to vary around (rho) with some dispersion.

In regard to (C) and (D), these are false.

In regard to (C), X(E) = 3.0 + 5.0*X(1) transforms a standard normal into N(3,5^2)

In regard to (D), although sums and differences of normals produce normals (summation stability property), products/divisions do not.

Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-310-probability-distributions-ii-miller-chapter-4.7036/

Page 126: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

126

P1.T2.311. Probability Distributions III, Miller

AIMs: Describe the properties of linear combinations of normally distributed variables. Identify the key properties and parameters of the Chi-squared, Student’s t, and F-distributions.

311.1. George the analyst creates a model, displayed below, which generates two series of random but correlated asset returns. Both asset prices begin at a price of $10.00 with a periodic mean return of +1.0%. Series #1 has periodic volatility of 10.0% while Series #2 has periodic volatility of 20.0%. The desired correlation of the simulated series is 0.80. Each series steps according to a discrete version of geometric Brownian motion (GBM) where price(t+1) = price (t) + price(t)*(mean + volatility*standard random normal). Two standard random normals are generated at each step, X(1) and X(2), but X(2) is transformed into correlated Y(1) with Y(1) = rho*X(1) + SQRT(1 - rho^2)*X(2), such that Y(1) informs Series #2. The first five steps are displayed below:

At the fourth step, when the Series #1 Price = $10.81, what is Y(1) and the Series #2 Price [at Step 4], both of which cells are highlighted in orange above?

a) -0.27 and $9.08 b) +0.55 and $9.85 c) +0.98 and $11.32 d) +2.06 and $12.40

Page 127: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

127

311.2. Sue the risk manager is modeling a random set of 10-day asset returns. The expected annual return (drift) of the asset is 10.0% with annual return volatility of 30.0%. She assumes a discrete version of GBM such that the 10-day return = expected return*(10/250) + volatility*sqrt(10/250)*(random deviate); i.e., drift scales linearly with time and volatility scales with the square root of time. For example, assuming a normal distribution, the random uniform variable (from 0 to 1.0) on the first trial happens to be a low 0.0250. As per an inverse transformation, this corresponds to a random normal deviation of about -1.960, the modeled 10-day return = +10%*10/250 + 30%*sqrt(10/250)*(-1.960) ~= -11.360%. Sue wants to model heavier tails, so she replaces the assumption of a normal distribution with the assumption of a student's t distribution that has only three degrees of freedom; i.e., 3 d.f. or k=3. Under this alternative assumption, given the same random uniform variable of 0.0250, what is the modeled 10-day return? (please note: this requires a student's t lookup)

a) -18.69% b) -15.52% c) -11.40% d) -7.53%

311.3. In regard to the chi-squared (U^2) and F-distribution, each of the following is true EXCEPT which is false?

a) Both the chi-squared and F-distribution are non-negative and positively skewed to the right

b) The square of variable with a student's t distribution with d.f. = k has an F-distribution with 1 and k d.f.; that is, t(k)^2 ~= F(1,k)

c) The sum of two independent chi-squared variables, with respectively k1 and k2 degrees of freedom, is itself chi-squared with (k1+k2) degrees of freedom

d) The chi-squared distribution is approximated by the ratio of two independent F-distributions; i.e., U^2(k) ~ F(m,n)/F(p,q)

Page 128: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

128

Answers: 311.1. C. 0.98 and $11.32 Correlated Series #2 = 0.80*1.02 + SQRT(1-0.80^2)*0.28 = 0.9840; i.e., the standard random normal 0.28 is transformed into another, correlated standard random normal of 0.98. The Series #2 Price [Step 4] = $9.38 + 9.38*(1% + 0.9840*20%) = $11.3198 311.2. A. -18.69% 10-day return [normal] = +10%*10/250 + 30%*sqrt(10/250)*(-1.960); and 10-day return [student's with 3 d.f.] = +10%*10/250 + 30%*sqrt(10/250)*(-3.18245) = -18.6947%, as T.INV(0.0250, 3 df) = -3.18245. Note the implied heavier tail. 311.3. D. The F-distribution is the distribution of the ratio of two independent chi-squared random variables; i.e., F(k1,k2) ~= (U1^2/k1) / (U2^2/k2).

Note that the F-distribution has two degrees of freedom while the chi-squared distribution has only one degree of freedom (such that its mean is k and its variance is 2k).

In regard to (A), (B), and (C), each is TRUE.

Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-311-probability-distributions-iii-miller.7066/

Page 129: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

129

P1.T2.312. Mixture distributions

AIM: Describe a mixture distribution and explain the creation and characteristics of mixture distributions.

312.1. A random variable X has a density function that is a normal mixture with two independent components: the first normal component has an expectation (mean) of 4.0 with variance of 16.0; the second normal component has an expectation (mean) of 6.0 with variance of 9.0. The probability weight on the first component is 0.30 such that the weight on the second component is 0.70. What is the probability that X is less than zero; i.e., Prob [X<0]?

a) 0.015% b) 1.333% c) 6.352% d) 12.487%

312.2. Analyst Susan wants to employ a mixture distribution in order to characterize the regime-switching dynamic of markets. Specifically, she wants to assume that 90% of the time equity returns are characterized by a "stable" normal distribution with a positive expected return but a low volatility (standard deviation); however, the other 10% of the time equity returns have a negative expected return but a high volatility. About the mixture distribution, each of the following is true except for which is false?

a) Per summation stability, if the components are normal, then the mixture distribution will also be normal

b) It is possible for the mixture distribution to exhibit negative skew and heavy tails leptokurtosis (i.e., kurtosis > 0)

c) Susan can assume non-normal but parametric component distributions d) Susan can add additional component distributions (i.e., probability density functions), but

in order to produce a legitimate mixture distribution, the sum of the component weights must equal one

312.3. Assume a normal mixture distribution, which characterizes random variable X, is a mixture of two independent normal components: N(2,1) and N(-2,1). That is, the components have the same variance, 1.0, but their expectations (means) differ: -2.0 and +2.0. The components are equally weighted. In regard to the mixture distribution, not the components, each of the following is true EXCEPT for which is false?

a) The mixture distribution is bimodal b) The mixture distribution has a mean of zero c) The variance of the mixture distribution is greater than 1.0 d) The probability that X is less than 2.0 is approximately 50%.

Page 130: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

130

Answers: 312.1. C. 6.352% Because the normal mixture distribution function is a probability-weighted sum of its component distribution functions, it is true that: Prob(mixture)[X < 0] = 0.30*Prob(1st component)[X < 0] + 0.70*Prob(2nd component)[X < 0]. In regard to the 1st component, Z = (0-4)/sqrt(16) = -4/4 = -1.0. In regard to the 2nd component, Z = (0-6)/sqrt(9) = -6/3 = -2.0. Such that: Prob(mixture)[X<0] = 0.30*Prob[Z < -1.0] + 0.70*Prob[Z < -2.0], Prob(mixture)[X<0] = 0.30*15.87% + 0.70*2.28% = 4.760% + 1.593% = 6.352%. 312.2. A. False. Summation stability refers to the fact that ADDING normal random variables (distributions) produces a normal random variable; e.g., adding consecutive daily log returns implies a normal n-day log return. To convolute is different than a mixture: http://en.wikipedia.org/wiki/Mixture_distribution Miller on the difference between convolution and mixture:

"Note that the two-step process is not the same as the process described in a previous section for adding two random variables together. An example of adding two random variables together is a portfolio of two stocks. At each point in time, each stock generates a random return, and the portfolio return is the sum of both returns. In the case we are describing now, the return appears to come from either the low-volatility distribution or the high-volatility distribution. Adding the probability density functions is not the same as adding random variables."

In regard to (B), (C), and (D), each is TRUE.

In regard to (B), it is likely under these assumptions that the mixture will exhibit negative skew and kurtosis > 3.0

312.3. D. Prob[X<2] = 50%*Prob[Z<4] + 50%*Prob[Z<0] = 50%*~50% + 50%*50% ~= 75%

In regard to A, B, and C, each is TRUE.

Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-312-mixture-distributions.7103/

Page 131: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

131

Distributions (Rachev Chapters 2 & 3) P1.T2.110. Rachev’s distributions P1.T2.111. Binomial & Poisson P1.T2.112. Rachev’s properties of normal distribution P1.T2.113. Rachev’s exponential P1.T2.114. Weibull distribution P1.T2.115. Gamma distribution (Rachev) P1.T2.116. Beta distribution (Rachev) P1.T2.117. Chi-square distribution P1.T2.118. Student’s t distribution P1.T2.119. Lognormal distribution P1.T2.120. Logistic distribution P1.T2.121. Extreme value distributions P1.T2.122. Stable distributions P1.T2.123. Hazard rate of exponential variable P1.T2.124. Exponential versus Poisson P1.T2.125. Generalized Pareto distribution (GPD) P1.T2.126. Mixtures of distributions

P1.T2.110. Rachev’s distributions

AIM: Describe the key properties of the Bernoulli distribution, Binomial distribution, and Poisson distribution, and identify common occurrences of each distribution.

110.1. Which distribution is discrete?

a) Bernoulli b) Binomial c) Poisson d) All of the above

110.2 Which distribution has more than one parameter?

a) Bernoulli b) Binomial c) Poisson d) All of the above

110.3 Which is most likely to characterize the frequency of operational losses?

a) Bernoulli b) Binomial c) Poisson d) All of the above

Page 132: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

132

110.4 Which is most likely to characterize the probability of default (PD) for a single bond?

a) Bernoulli b) Binomial c) Poisson d) All of the above

110.5 Which is most likely to characterize the probability that a 3rd-to-default (nth-to-default) basket credit default swap (CDS) will be triggered under the dubious assumption of default independence?

a) Bernoulli b) Binomial c) Poisson d) All of the above

110.6 Which distribution does NOT tend to approximate the normal as one of its parameters increases?

a) Bernoulli b) Binomial c) Poisson d) All of the above

Page 133: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

133

Answers: 110.1. D. all of the above Bernoulli, binomial and Poisson are each a discrete distribution. 110.2 B. Binomial Bernoulli has one param (p) and Poisson has one param (lambda) 110.3. C. Poisson 110.4. A. Bernoulli 110.5. B. Binomial The binomial assume i.i.d., the independence is typically a dubious assumption. 110.6 A. Bernoulli Binomial tends to normal as n increases; Poisson tends to normal as lambda increases. Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-110-rachevs-distributions.3964/

Page 134: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

134

P1.T2.111. Binomial & Poisson

AIM: Identify the distribution functions of Binomial and Poisson distributions for various parameter values.

111.1 A 3rd-to-default basket credit default swap (basket CDS) contains 40 credits. The credits are independent (default correlation = 0) and each credit has a PD = 2.0%. What is the probability that the basket CDS will be triggered?

a) 0.80% b) 2.33% c) 4.57% d) 80.0%

111.2 Given the same basket CDS (n=40, pd = 2%) but instead assume perfect default correlation (correlation = 1.0). What is the probability that the 3rd-to-default basket CDS will be triggered?

a) 2.0% b) 6.0% c) 16.0% d) 100%

111.3 A bank’s 99% VaR model is perfectly accurate. The bank conducts a two year backtest (500 trading days). The expected number of exceptions (a.k.a., exceedences; days on which the loss exceeds VaR) is therefore 1% * 500 = 5. If the daily losses are i.i.d., what is the probability that the backtest confirms five (5) daily losses over the period?

a) 14.0% b) 17.6% c) 44.0% d) 61.6%

111.4 At a major broker-dealer, there occurs on average a high-frequency, low-severity (HFLS) operational loss five times per day. According to a Poisson distribution, on a given single day, what is the probability that at least one operational loss will occur?

a) 89.3% b) 91.3% c) 95.3% d) 99.3%

Page 135: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

135

111.5 If the average rate of HFLS operational loss events instead is 20 per workweek (20 every five days), if we assume a Poisson distribution, what is the probability on a given single day that between four and six HFLS losses will occur, inclusive; i.e., [4,6] not (4,6)?

a) 35.6% b) 45.6% c) 55.6% d) 65.6%

Page 136: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

136

Answers: 111.1. C. 4.57% P(X=0) = 44.57%; P(X=1) = 36.384%; P(X=2) = 40!/(2!*38!)*2%^2*98%^38 = 0.1448 = 14.479%. P (3 or more defaults) = 1 - P(2 or fewer defaults) = 1 - 44.57% - 36.384% - 14.479% = 4.57% 111.2 A. 2.0%

With perfect correlation, there are only two outcome: all default (2%) or none default (98%). Please note: this 3rd-to-default basket CDS behaves like the junior (equity) tranche of a CDO. An increase in correlation (from 0 to 1.0) causes a decrease in the risk of the tranche.

111.3. B. 17.6% =BINOM.DIST(X=5, n=500, p=1%, pdf=false) = 17.6% 111.4. D. 99.3% P(X>0) = 1 - P(X=0) = 1 - 5^0*EXP(-5)/0! = 1 - EXP(-5) = 99.33% 111.5. B. The rate is 20/5 or 4 per day. Lambda is 4. P(X=4) = 19.537%, P(X=5) = 15.629%, P(X=6) = 10.42%. P(X=4) + P(X=5) + P(X=6) = 19.537% + 15.629% + 10.42% = 45.586%

Q&A from the forum: “Can you walk me through 111.5?”

Answer from David: “Assuming Poisson, we want the only parameter it has: lambda. Lambda in Poisson (http://en.wikipedia.org/wiki/Poisson_distribution) is the average or expected number of events that occur in a time interval. As the question asks about the number per day, we want to convert our average of 20 per week into 4 per day.

So, the essential first step is to see that we want to use a lambda of 4 in a Poisson distribution. Then, the answer is "merely" a matter of calculating the probability, conditional on lambda of 4, that the number of losses in a day is: four, or five, or six As the Prob[N=x] = lambda^x * exp(-lambda) / x!, The Prob[N=4] = 4^4 * exp(-4) / 4! = 19.5% Prob[N=5] = 4^5 * exp(-4) / 5! = 15.6% Prob[N=6] = 4^6 * exp(-4) / 6! = 10.4% These are mutually exclusive (either/or) so we just add them up. I hope that helps, David Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-111-binomial-poisson.3968/

Page 137: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

137

P1.T2.112. Rachev’s properties of normal distribution

AIM: Describe the key properties of normal distribution (Rachev not Gujarati)….

112.1 If a normal random variable has both mean and variance of 100, what is approximately the probability that the variable falls between 76.7 and 83.6 (no calculator required)?

a) 2.0% b) 3.0% c) 4.0% d) 5.0%

112.2 Let (C) be a random normal variable that characterizes the temperature in degree Celsius. Assume (C) has mean of 30.0 and standard deviation of 2.0. Let (F) be the corresponding temperature in degrees Fahrenheit given by F = 1.8*C + 32. Which of the following statements is the best direct reflection of the location-scale invariance property of the normal distribution?

a) If mean of C were instead 0 with variance of 1, then C would be a standard normal b) Standard deviation of F is 2.0 c) Standard deviation of F is 3.6 d) F is normally distributed

112.3 If the daily returns of the S&P 500 are independent and normally distributed (i.i.d. normal), which property of the normal confirms that weekly and monthly returns are also normal?

a) Location-scale invariance b) Summation stability c) Square root rule d) Domain of attraction

112.4. Which of the following distributions is stable with its own domain of attraction?

a) Normal distribution b) Cauchy distribution c) Levy distribution d) All of the above

Page 138: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

138

Answers: 112.1. C. 4.0% (76.7 - 100)/10 = -2.33 (83.6 - 100)/10 = -1.64 Prob [-2.33 deviates < X < -1.64 deviates] = 5% - 1% = 4% 112.2 D. F is normally distributed (C) is a true statement: F = 1.8*C + 32 and variance(F) = variance(1.8*C + 32) = 1.8^2*variance (C) = 1.8^2*2^2 = 12.96 and standard deviation (C) = SQRT(12.96) = 3.6. So, it is instructive to find the variance of C … … but this does not illustrate the location-scale invariance property of the normal, but rather a simple property of variance Rachev: “One important properties is the so-called location-scale invariance of the normal distribution. What does this mean? Imagine that you have random variable X, which is normally distributed with the parameters µ and σ. Now we consider the random variable Y, which is obtained as Y = aX + b. In general, the distribution of Y might substantially differ from the distribution of X, but in the case where X is normally distributed, the random variable Y is again normally distributed with parameters and . Thus, we do not leave the class of normal distributions if we multiply the random variable by a factor or shift the random variable. This fact can be used if we change the scale where a random variable is measured: Imagine that X measures the temperature at the top of the Empire State Building on January 1, 2006, at 6 A.M. in degrees Celsius. Then Y = 9/5*X + 32 will give the temperature in degrees Fahrenheit, and if X is normally distributed then Y will be too.” 112.3 B. Summation stability Rachev: “If you take the sum of several independent random variables, which are all normally distributed with mean µ(i) and standard deviation σ(i), then the sum will be normally distributed again. Why is the summation stability property important for financial applications? Imagine that the daily returns of the S&P 500 are independently normally distributed with µ = 0.05% and σ = 1.6%. Then the monthly returns again are normally distributed with parameters µ = 1.05% and σ = 7.33% (assuming 21 trading days per month) and the yearly return is normally distributed with parameters µ = 12.6% and σ = 25.40% (assuming 252 trading days per year). This means that the S&P 500 monthly return fluctuates randomly around 1.05% and the yearly return around 12.6%.” 112.4 D. All Three Rachev: “The last important property that is often misinterpreted to justify the nearly exclusive use of normal distributions in financial modeling is the fact that the normal distribution possesses a domain of attraction. A mathematical result called the central limit theorem states that—under certain technical conditions—the distribution of a large sum of random variables behaves necessarily like a normal distribution. In the eyes of many, the normal distribution is the unique class of probability distributions having this property. This is wrong and actually it is the class of stable distributions (containing the normal distributions), which is unique in the sense that a large sum of random variables can only converge to a stable distribution. We discuss the stable distribution in Chapter 7.” Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-112-rachevs-properties-of-normal-distribution.3979/

Page 139: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

139

P1.T2.113. Rachev’s exponential

AIM: Describe the key properties of the exponential distribution (Rachev)

113.1 Each of the following is TRUE about the exponential distribution EXCEPT for:

a) Like the lognormal, it is non-negative b) Like the Poisson, it has only parameter c) Like the Weibull, it is continuous d) Like the binomial, it can be either light- or heavy-tailed

113.2 If the hazard rate (a.k.a., default intensity) of a bond is 0.025 (2.5%), what is the cumulative probability of default (cumulative PD) over seven years according to the exponential distribution?

a) 11.75% b) 13.93% c) 16.05% d) 18.13%

113.3 If the exponential CDF is given by F(x) = 1 - EXP(-lambda*x), where lambda is 0.4, what are the mean and standard deviation (sigma)?

a) 0.40 mean and 2.50 sigma b) 0.40 mean and 6.25 sigma c) 2.50 mean and 2.50 sigma d) 2.50 mean and 6.25 sigma

113.4 If the default intensity (hazard rate) of a bond is constant at 1.0%, what is the conditional probability of default during the fifth year; i.e., 5-year conditional PD (this question is level 2 difficulty)?

a) 1.0% b) 1.3% c) 1.6% d) 1.9%

Page 140: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

140

Answers: 113.1 D. In regard to (A), (B), and (C), the exponential is non-negative, has only one parameters and is continuous. While the binomial can be light or heavy-tailed, the exponential has kurtosis = 9 (excess kurtosis = 6) 113.2 C. 16.05% 1-EXP(-2.5%*7) = 1- EXP(-0.175) = 16.05% 113.3. C. 2.5 and 2.5 Mean = 1/lambda = 1/0.4 = 2.5 Variance = 1/lambda^2, such that Standard deviation = SQRT(1/0.4^2) = 2.5 113.4 A. 1.0% Cumulative 5-year PD = 1 - EXP(-1%*5) = 4.877% Cumulative 4-year PD = 1 - EXP(-1%*4) = 3.921% Unconditional PD in year 5 = 4.877% - 3.921% = 0.956% Conditional PD in year 5 = 0.956% / (1-3.921%) = 0.9950% = 1.00% Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-113-rachevs-exponential.3994/

Page 141: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

141

P1.T2.114. Weibull distribution

AIM: Describe the key properties of Weibull distribution.

114.1 The Weibull CDF is given by F(x) = 1 - EXP[-((x/B)^a))] where (B) is scale and (a) is shape. If scale (B) = 1.0 and shape (a) = 0.5, what the Weibull distribution probability that the random variable will be less than 0.8?

a) 40.9% b) 49.9% c) 59.12% d) 89.44%

114.2 Which of the following Weibull distributions has the heaviest tail, given the scale (B) parameter and the shape (a) parameter?

a) a = 0.5, B = 1.0 b) a = 1.0, B = 1.0 c) a = 1.0, B = 2.0 d) a = 2.0, B = 1.0

114.3 EACH of the following is TRUE about the Weibull distribution EXCEPT for:

a) It can be light- or heavy-tailed b) If shape parameter (a) is 1.0, reduces to the exponential distribution c) If shape parameter (a) is LESS THAN 1.0, Weibull exhibits heavy tails d) It can have heavy tails but not heavier than the exponential distribution

114.4 If the default intensity of noninvestment grade bond is decreasing with time, the Weibull may be better than the exponential to model the effect which is called:

a) Teething troubles b) Aging effect c) Light-tailed d) Heavy-tailed

Page 142: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

142

Answers: 114.1 C. 59.12% 1 - EXP[-(0.8^0.5)] = 59.12% 114.2 A. a = 0.5, B = 1.0 a < 0 implies heavy tails. 114.3 D. The exponential distribution has kurtosis = 9 (excess kurtosis = 6), which is heavy-tailed. But the Weibull shape parameter (a) gives it more flexibility for heavier or less heavy tails. In regard to (A), (B), and (C), each are true. If shape (a) < 1.0, Weibull has heavy-tail; if a > 1.0, Weibull has light tail. 114.4 A. Teething troubles. Rachev: “The main difference to the case of an exponential distribution is the fact that the default intensity depends upon the point in time t under consideration. For α > 1—also called the “light-tailed” case—the default intensity is monotonically increasing with increasing time, which is useful for modeling the “aging effect” as it happens for machines: The default intensity of a 20-year old machine is higher than the one of a 2-year old machine. For α < 1—the “heavy-tailed” case—the default intensity decreases with increasing time. That means we have the effect of “teething troubles,” a figurative explanation for the effect that after some trouble at the beginning things work well, as it is known from new cars. The credit spread on noninvestment-grade corporate bonds provides a good example: Credit spreads usually decline with maturity. The credit spread reflects the default intensity and, thus, we have the effect of “teething troubles.” If the company survives the next two years, it will survive for a longer time as well, which explains the decreasing credit spread. For α = 1, the Weibull distribution reduces to an exponential distribution with parameter β.” Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-114-weibull-distribution.4003/

Page 143: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

143

P1.T2.115. Gamma distribution (Rachev)

AIM: Describe the key properties of ... gamma distribution (Rachev)

115.1 Among the following use cases, where are we most likely to encounter the gamma distribution?

a) Short-horizon market risk value at risk (profit and loss) b) To conduct a VaR backtest c) Credit portfolio losses d) Frequency of operational losses

115.2 EACH of the following is TRUE about the gamma distribution EXCEPT for:

a) It has two parameters, alpha and beta, like the beta distribution b) It reduces to a discrete distribution if alpha is less than zero c) It can reduce to the gamma distribution (alpha = 1.0) d) It can reduce to the chi-square distribution

115.3 EACH of the following is TRUE about the gamma distribution EXCEPT for:

a) It has an infinite variance b) It always has a heavy-tail c) It always has positive skew d) It has mean of product of two params; mean = shape*scale

Page 144: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

144

Answers: 115.1. C. Credit portfolio losses De Servigny shows how gamma distribution can be a good characterization of credit portfolio losses; e.g., CreditRisk+ utilizes gamma distribution.

In regard to (A), this is unlikely because gamma is non-negative

In regard to (B), as Jorion shows, this rather calls for a binomial (i.e., to exceed or not exceed the VaR is a Bernoulli; a series of Bernoullis is a binomial).

In regard to (D), this suggests rather a discrete distribution

115.2 B. Gamma is continuous; both params must be greater than zero (Gamma does have a discrete “cousin:” negative binomial distribution ….)

(A), (B) and (C) are all TRUE.

115.3 A. Variance = shape*scale^2

In regard to (B), (C), and (D), these are all TRUE.

Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-115-gamma-distribution-rachev.4010/

Page 145: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

145

P1.T2.116. Beta distribution (Rachev)

AIM: Describe the key properties beta distribution (Rachev)...

116.1 Among the following use cases, where are we most likely to find the beta distribution?

a) Exposure at default (EAD) b) Probability of default (PD) c) Loss given default (LGD) d) Jump to default

116.2 EACH of the following is TRUE about the beta distribution EXCEPT for:

a) Like the Weibull distribution, beta has two parameters b) Beta has positive skew c) If both parameters equal one (alpha = beta = 1.0), beta reduces to uniform distribution d) Rachev says beta can be used to model unknown probability of credit rating migration

(transition)

116.3 Which is a weakness of the beta distribution?

a) It has an undefined variance b) It is difficult to constrain to the domain (support) interval [0,1] which constitutes a

probability c) It cannot model heavy tails d) It cannot model bimodal distributions, which have been empirically observed in

recoveries

Page 146: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

146

Answers: 116.1 C. Loss given default (LGD) or recovery rate At least in the FRM, the beta is only commonly encountered as a popular distribution for modeling LGD due to its flexibility; e.g., by varying the mean, it can model senior (lower LGD) or junior obligations (higher LGD) 116.2. B. Skew can be negative, positive or zero

In regard to (A), (C), and (D), each are true.

116.3. D. Beta distribution is unimodal.

In regard to (A), the beta distribution has a defined variance.

In regard to (B), this is false: an advantageous FEATURE of the beta is its natural support for [0,1]

In regard to (C), beta can model light- or heavy-tails

Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-116-beta-distribution-rachev.4018/

Page 147: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

147

P1.T2.117. Chi-square distribution

AIM: L1.T2.117. Describe the key properties of Chi-squared

117.1 Let Z be a standard normal random variable; Z ~ N(0,1). Let X = Z^2, which implies that X is distributed as a chi-square variable with one degree of freedom (1 d.f.). Without using the lookup table, or excel, what is the critical chi-square value for X at 10% significance. In other words, at what critical value (L) is the probability (X > L) = 10%?

a) -1.65 b) +1.65 c) + 2.70 d) - 2.70

117.2 Let Z(i) be a standard normal random variable, Z ~ N(0,1) and X(i) = Z(i)^2. Assume a series of ten independent normal variables: Z(1), Z(2), ... Z(10). If we square each Z(i) and sum the series, the resulting summation is given by C(10) = X(1) + X(2) + ... X(10), which follows a chi-square distribution. What is the variance of C(10)?

a) 9.0 b) 10.0 c) 18.0 d) 20.0

Page 148: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

148

Answers: 117.1. C +2.70 Per the standard normal value at risk, we know that the probability (Z < -1.645) = 5%. This implies, on the left tail, the Probability (Z^2 > -1.645^2) = 5% and, on the right tail, the Probability (Z^2 > +1.645^2) = 5%. Together, the Probability (Z^2 > 2.7055) = 10%. Please note that, per the squaring of the normal variable, the chi-square distribution must be non-negative (answers A and D are impossible). 117.2 D. variance C(10) = 20 The expected value of a chi-square r.v. = d.f. and the variance is 2*d.f; in this case, the variance S(10) = 2*10 = 20, since the degrees of freedom is 10. Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-117-chi-square-distribution.4062/

Page 149: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

149

P1.T2.118. Student’s t distribution

AIM: Describe the key properties of Student’s t

118.1 What is the total area under the pdf curve of a student’s t probability distribution; i.e., what is the limit of the CDF as x tends to infinity?

a) Less than 1.0 b) Equal to 1.0 c) Greater than 1.0 d) Depends on the degrees of freedom

118.2 The student’s t distribution is a function of

a) Normal distribution b) Chi-square distribution c) Lognormal distribution d) Normal and chi-square distributions

118.3 Assume we conduct a multivariate regression with based on a sample of 32 observations (n=32). The regression produces four regression coefficients, an intercept plus three partial slope coefficients. These OLS estimates are characterized by a student’s t distribution with what, respectively, mean, variance, skew and kurtosis?

a) 0 (mean), 1.00 (variance), 0 (skew), 3.0 (kurtosis) b) 0, 1.08, 0, 3.0 c) 0, 1.00, 0, 3.25 d) 0, 1.08, 0, 3.25

Page 150: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

150

Answers: 118.1 B. (equal to 1.0) It is part of the definition of a probability distribution that the area under the curve equals 1.0; i.e., CDF(x tends to infinite) = 1.0 Various degrees of freedom will impact peak/tail but will not change the probability distribution 118.2 D. Normal and chi-square distributions Rachev: “The t-distribution (also known as the Student t-distribution) occurs again as a function of other random variables, namely the normal and the Chi-square distribution. If X is a standard normal random variable and Z a Chi-square distributed random variable with n degrees of freedom which is independent of X, then by definition the distribution of the random variable Y defined as Y = X/SQRT(Z/n) possesses a t-distribution with n degrees of freedom.” 118.3 D. 0, 1.08, 0, 3.25 For any student’s t (without location & scale; i.e, one parameter student’s t), the mean = 0 and the skew = 0. variance = df/(df-2). In this case, df = 32 - 4 = 28 and variance = 28/(28-2) = 1.077. excess kurtosis = 6/(df-4). In this case, excess kurtosis = 6/24 = 0.25 such that excess kurtosis = 3.25; i.e., student’s t always has a heavy tail but it’s only a slightly heavy tail! Key points about student’s t:

Always mean = 0 with symmetry (skew = 0)

Always variance > 1.0

Always slight heavy tail; kurtosis > 3.0 (excess kurtosis > 0.0)

As df increases, approaches normal

As df > 30, approximates normal (that is why normal can be used for “large samples”)

After the normal, this is your most commonly encountered (sampling) distribution! Why? Because under CLT, regression estimates are normally distributed if we know the variance, but if we do not know the population variance, they are student’s t distributed and, in practice, we rarely know the population variance.

Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-118-students-t-distribution.4068/

Page 151: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

151

P1.T2.119. Lognormal distribution

AIM: L1.T2.119. Describe the key properties of the lognormal distribution

119.1 If the variable (Y) is a normal random variable, such than Y ~ N(mu, sigma^2), which of the following (X) variables is lognormally (log-normally) distributed?

a) X = EXP(Y) = e^Y b) X = LN(Y) c) X = Y(1) + Y(2) + ... Y(n) d) X = LN[Y(2)/Y(1)]

119.2 Assume today’s stock price S(0) is $100, the daily log (continuously compounded) return has mean of 0.0 and standard deviation of 0.10 (10%), and tomorrow’s stock price is lognormally distributed. What is the approximate probability that tomorrow’s stock price will exceed $117.94?

a) about 1% b) 1.43% c) 4.46% d) about 5%

119.3 Each of the following is TRUE about the lognormal distribution EXCEPT:

a) Is always non-negative with positive skew and leptokurtic (heavy tailed) b) If price S(t) is lognormal, then LN[S(t)] is normal c) The sum of lognormal variables is also lognormal d) The product of lognormal random variables is also lognormal

Page 152: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

152

Answers: 119.1 A. if (Y) is N(.), then X = EXP(Y) = e^Y is lognormal 119.2 D. about 5% The log return is normally distributed such that LN(117.94/100) = .165 which is ~N(0,.1^2). In standard normal units, Z = (0.165 - 0 mean)/0.1 = 1.65. The probability = 1 - CDF = 1 - P[Z<1.65] = 1 - 95% or ~ 5.0% (4.947%). To confirm directly, note that LOGNORM.DIST(117.94/100,0,.1,true) = 95.053% 119.3 C. False: sum of lognormal variables is not lognormal

(A) is true.

(B) is true by definition. S(t) = S(0)*EXP(Y) where (Y) is the normal return, then S(t)/S(0) = EXP(Y) and LN[S(t)/S(0)] = LN[EXP(Y)] = Y where both the log return, LN[S(t)/S(0)], and the price level, S(t) are normally distributed. In short, a variable is lognormal if the LN(variable) is normal.

(D) product, but not the sum, of lognormals is lognormal

Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-119-lognormal-distribution.4072/

Page 153: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

153

P1.T2.120. Logistic distribution

AIM: Describe the key properties of Logistic

120.1 What are the skew and kurtosis of the logistic distribution?

a) 0 (skew) and 3 (kurtosis) b) 0 (skew) and 4.2 (kurtosis) c) 1.2 (skew) and variable (kurtosis) d) 1.2 (skew) and light- or thin-tailed

120.2 EACH of the following is TRUE about the logistic distribution EXCEPT for:

a) It has one parameter b) It is unbounded above and below (i.e., infinite support) c) It is symmetrical like the normal d) The mean equals the median equals the mode equals the location parameter

Page 154: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

154

Answers: 120.1. B. 0 skew and 4.2 kurtosis (excess kurtosis = 6/5) 120.2 A. Logistic has two params, location (alpha) and scale (beta >0)

In regard to (B), (C) and (D) each is true.

Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-120-logistic-distribution.4080/

Page 155: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

155

P1.T2.121. Extreme value distributions

AIM: Describe the key properties of Extreme Value distributions.

121.1. The assigned Rachev (Fat-tailed and skewed asset returns) introduces two (2) extreme value distributions that already include (specialize into) the third extreme value distribution. Which are the two basic extreme value distributions?

a) Gumbel type and Generalized Extreme Value (GEV) b) Gumbel type and Frechet c) Generalized Extreme Value (GEV) and Generalized Pareto Distribution (GPD) d) Generalized Pareto distribution (GPD) and peaks-over-threshold (POT)

121.2. Which best summarizes a key difference between the two basic extreme value distributions in Rachev?

a) One characterizes the maximum loss within time blocks (or a large sample) while the other characterizes losses above a threshold

b) One characterizes the body of the distribution (and its central tendency) while the other characterizes only the tail

c) One fits the threshold parameter by statistical method (e.g., MLE) while the other lets the user choose the threshold parameter subjectively

d) The key difference is the value of the shape (Greek xi) parameter; there is an EV for shape < 0 and an EV for shape >0.

121.3 Which case of the generalized extreme value(GEV) distribution is most likely to characterize financial return data?

a) Frechet b) Gumbel c) Weibull d) GEV with shape (xi) param equal to zero

Page 156: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

156

Answers: 121.1. C. GEV and GPD

In regard to (a), Gumbel is a special case of GEV when shape (aka, tail index) param (xi) = 0.

In regard to (b), Gumbel and Frechet are both GEV based on shape param

In regard to (d), GPD characterizes POT Please note: Rachev is consistent with Dowd in Level 2. In Level 2, Dowd explores GEV (block maxima) versus GPD (POT).

121.2. A. GEV characterizes maximum of large sample (“block maxima” or blocks of time) while GPD characterizes peaks-over-threshold.

In regard to (b), both EV distributions characterize the “child” tail and neither the body/central tendency; i.e., the idea of EV is to characterize the tail.

In regard to (c), GEV does not have a threshold param; in the GPD(POT), the threshold choice is inevitably subjective.

In regard to (d), the GEV has three cases depending on the shape param (Gumbel, Weibull and Frechet) but this is not a difference between GEV and GPD

121.3. A. Frechet, which has heavy tails

In regard to (b) and (d), shape param equal to zero implies Gumbel which has light/~normal tails

In regard to (c), Weibull has light tails

Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-121-extreme-value-distributions.4085/

Page 157: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

157

P1.T2.122. Stable distributions

AIM: Explain the summation stability of normal distributions.

122.1 Which best describes a stable distribution?

a) A stable distribution has conditional moments that are equal to unconditional moments b) A stable distribution exhibits central tendency such that the distribution of the sum of

many copies of itself converges on the normal distribution c) A stable distribution has a defined location, scale, skew and kurtosis d) A stable random variable can be added to itself such that the sum preserves the same

distribution as each independent variable.

122.2 Each of the following is a stable distribution EXCEPT for:

a) Normal b) Gamma c) Cauchy d) Levy

122.3 Each of the following is true about the stable distribution EXCEPT for:

a) We can specify a distribution that is both stable and heavy-tailed b) If a financial asset’s daily returns are i.i.d. normal, then its monthly returns are normal c) The stable distribution, like the normal, only requires two parameters d) The sum of i.i.d. normal or non-normal random variables converges to a stable

distribution

Page 158: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

158

Answers: 122.1 D. A stable random variable can be added to itself such that the sum preserves the same distribution as each independent variable.

In regard to (A), this is a pretty good answer just not as exacting as the correct Rachev: “Another interesting and important property of normal distributions is their summation stability. If you take the sum of several independent random variables, which are all normally distributed with mean µ(i) and standard deviation σ(i), then the sum will be normally distributed again.” ... another way to look at this is: a stable distribution does not depend on the time interval if the variables summed are independent.

See the excellent wikipedia entry: http://en.wikipedia.org/wiki/Stable_distribution 122.2. B (Gamma) The three special cases of a stable distribution are normal (Gaussian), Cauchy and Levy. 122.3 C. (the stable has four parameters; the normal is but a special case)

In regard to (a), the Cauchy distribution has “must fatter tails than the normal distribution” and the Levy can exhibit much fatter tails.

In regard to (b), this is the essential “summation stability” property illustrated by Rachev: “Another interesting and important property of normal distributions is their summation stability. If you take the sum of several independent random variables, which are all normally distributed with mean µ(i) and standard deviation σ(i), then the sum will be normally distributed again.”

In regard to (d), “The second property [Central Limit Theorem] is also well-known from the Gaussian framework and it generalizes to the stable case. Specifically, by the Central Limit Theorem, appropriately normalized sums of independent and identically distributed (i.i.d) random variables with finite variance converge weakly to a normal random variable, and with infinite variance, the sums converge weakly to a stable random variable. This gives a theoretical basis for the use of stable distributions when heavy tails are present and stable distributions are the only distributional family that has its own domain of attraction—that is a large sum of appropriately standardized i.i.d random variables will have a distribution that converges to a stable one. This is a unique feature and its fundamental implications for financial modeling are the following: If changes in a stock price, interest rate or any other financial variable are driven by many independently occurring small shocks, then the only appropriate distributional model for these changes is a stable model, that is, normal or nonnormal stable.”

Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-122-stable-distributions.4089/

Page 159: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

159

P1.T2.123. Hazard rate of exponential variable

AIM: Describe the hazard rate of an exponentially distributed random variable.

123.1. What is the three-year cumulative probability of default (cumulative PD) of a bond with default intensity (hazard rate) of 3.0%?

a) 3.00% b) 6.96% c) 8.61% d) 8.73%

123.2 The exponential distribution informs us that the five-year cumulative probability of default (cumulative PD) of a bond is 10.0%. What is the implied default intensity?

a) 2.00% b) 2.11% c) 2.56% d) 2.96%

123.3. A call center receives an average of 20 customer support calls per hour; the number of calls per hour can be characterized by the Poisson distribution. What is the probability that the phone will NOT ring in the next three (3) minutes?

a) 3.7% b) 23.7% c) 36.8% d) 63.2%

123.4 If used to model defaults, which of the following best describes the lambda parameter in the exponential distribution?

a) The instantaneous probability of default at time (t) conditional on survival until time (t) b) The instantaneous probability of default at time (t) conditional on time (t); i.e., not

constant over time c) The cumulative probability of default until time (t) conditional on survival until time (t) d) The unconditional probability of default at time (t)

Page 160: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

160

Answers: 123.1. C. 8.61% Cumulative PD is given by the exponential CDF such that P[default within 3 years] = 1 - exp(-3 years * 3% hazard) = 8.61% 123.2. B. 2.11% Cumulative PD = 1 - EXP(-lambda*T) = 10%, such that: 1-10% = EXP(-lambda*5), taking LN() of both sides: LN(90%) = -lambda*5, and lambda = -1/5*LN(90%) = 2.11% 123.3. C. 36.8% lambda = 20 calls/60 minutes = 1/3 calls per minute; i.e., lambda (a.k.a., hazard rate) = average rate = 1/3. Beta parameter in exponential = 1/lambda = 3.0 The “waiting time” probability is characterized by the exponential distribution: P[next call within 3 min] = 1 - EXP(-X*lamda) = 1 - EXP(-3*1/3) = 63.2% P[no calls in the next 3 min] = EXP(-3*1/3) = EXP(-1) = 36.8% 123.4 A. Hazard rate (default intensity) is a conditional PD Rachev: “What is the interpretation of this expression? λ(∆t) represents a ratio of a probability and the quantity ∆t. The probability in the numerator represents the probability that default occurs in the time interval (t,t + ∆t] conditional upon the fact that Ford Motor Company survives until time t. Now the ratio of this probability and the length of the considered time interval can be denoted as a default rate or default intensity. In applications different from credit risk we also use the expressions hazard or failure rate. Now, letting ∆t tend to zero we finally obtain after some calculus the desired relation λ = 1/β.” Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-123-hazard-rate-of-exponential-variable.4093/

Page 161: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

161

P1.T2.124. Exponential versus Poisson

AIM: Explain the relationship between exponential and Poisson distributions.

124.1 Which is a continuous distribution?

a) Binomial b) Poisson c) Exponential d) None

124.2 Your staff has determined that your 95% daily VaR model is perfectly accurate: on any given day, the probability that the loss exceeds VaR is 5%. What is the probability that next month, which has 20 trading days, the VaR will NOT be exceeded on three or more days (VaR will be exceeded on two days or less)?

a) 86.47% b) 91.97% c) 92.45% d) 99.99%

124.3 Your staff observes that a 95% daily VaR has been exceeded, on average, once every month (i.e., the daily loss exceeds the VaR one day per month, on average), where there are 20 trading days in a month. What is the probability that next month the VaR will NOT be exceeded on three or more days (VaR will be exceeded on two days or less)?

a) 86.47% b) 91.97% c) 92.45% d) 99.99%

124.4 Your staff observes that a 95% daily VaR has been exceeded, on average, once every month; there are 20 trading days per month (same assumptions as the previous question, 124.3). What is the probability that VaR will be exceeded at some in the next two months?

a) 86.47% b) 91.97% c) 92.45% d) 99.99%

Page 162: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

162

Answers: 124.1 C. Exponential Binomial and Poisson are discrete. 124.2 C. Binomial P[X <= 2] = 92.45% p = 5%, x=2, n=20 binomial CDF P[X <= 2] = BINOM.DIST[2 success, 20 trials, p = 5%, TRUE = CDF] = 92.45%; i.e., equal to p[X=0] + p[X=1] + p[X=2] = 35.85% + 37.74% + 18.87% 124.3 B. Poisson P[X <= 2] = 91.97% lambda = 1 per month, X = 2 Poisson CDF P[X <= 2] = POISSON.DIST[X = 2, lambda = 1, TRUE = CDF] = 91.97% 124.4 A. Exponential P [event < 2 months] = 1 - EXP(-2 * 1) = 86.47% In exponential, the probability of the Poisson event occurring within time (T) is given by 1 - EXP(-T*lambda) In summary:

Binomial is used when we have the exact probability (p) of the event occurring.

Poisson is used when we have an average rate; e.g., events/day.

Exponential is used to characterize the waiting time (inter-arrival time) of a Poisson variable.

Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-124-exponential-versus-poisson.4095/

Page 163: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

163

P1.T2.125. Generalized Pareto distribution (GPD)

AIM: Explain why the generalized Pareto distribution is commonly used to model operational risk events.

125.1 Which is the best reason to prefer a generalized Pareto distribution over a Frechet distribution to model operational risk events?

a) The need to model loss severity as continuous rather than discrete b) The want to model extreme losses above a threshold (tail) rather than local (block)

maxima c) The want to model heavy tails d) The need to calculate value at risk (VaR) and expected shortfall (ES)

125.2 The generalized Pareto distribution naturally characterizes (best answer):

a) The distribution of (peak) losses above a high threshold conditional on a parent distribution that belongs to an EVT class and must be known to us

b) The distribution of (peak) losses above a high threshold conditional on a parent distribution that belongs to an EVT class but can be unknown to us

c) The distribution of (peak) losses above a high threshold for any common distribution and must be known to us

d) The distribution of (peak) losses above a high threshold for any common distribution but can be unknown to us

125.3 (difficult) Let the generalized Pareto distribution (GPD) characterize only the 5% tail of the loss distribution, per the extreme value theory (EVT) method. As Rachev shows, the CDF GPD is given by F(x) = 1 - (1 + Xi*x/sigma)^(-1/Xi) where Xi is the shape (aka, tail) index and sigma is the scale parameter. Assume sigma is 0.90 and Xi is 0.12; i.e., empirically plausible parameters. The threshold loss (u) is 2.0. What is (i) the probability that the excess loss over the threshold is less than 1.6 conditional on the loss exceeding the 2.0 threshold and, related, (ii) the unconditional probability that the loss does not exceed 3.6?

a) 90% and 95% b) 95% and 99% c) 80% and 95% d) 80% and 99%

Page 164: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

164

Answers: 125.1 B. The want to model extreme losses above a threshold (tail) rather than local (block) maxima

In regard to (A), both EVT distributions are continuous (loss severity is generally continuous while loss frequency is discrete)

In regard to (C), both model heavy tails.

In regard to (D), as both EVT distributions characterize tail distribution, it follows that both can model EVT and ES.

125.2 D. GPD is the convergence of POT for any (parent) distribution Rachev: “The generalized Pareto distribution occurs in a natural way as the distribution of so-called “peaks over threshold.” Let us assume we have a sequence of independent and identically distributed random variables X1, X2, … where the distribution of the single Xi is given by the distribution function F and we assume that the maximum of the random variables converges in distribution to a generalized extreme value distribution with parameter ξ. Now consider a large enough threshold u, and consider the distribution of X – u conditional on X being greater than u. It can be shown that the limit of this conditional distribution will be a member of the class of generalized Pareto distributions. Possible applications are in the field of operational risks (see Chapter 16), where one is only concerned about losses above a certain threshold.” Dowd: “This [GP distribution] gives the probability that a loss exceeds the threshold u by at most x, given that it does exceed the threshold. The distribution of X itself can be any of the commonly used distributions: normal, lognormal, t, etc., and will usually be unknown to us. However, as u gets large, the Gnedenko–Pickands–Balkema–deHaan (GPBdH) theorem states that the distribution Fu(x) converges to a generalised Pareto distribution, given by ...This distribution has only two parameters: a positive scale parameter, β, and a shape or tail index parameter, ξ , that can be positive, zero or negative. This latter parameter is the same as the tail index encountered already with GEV theory. The cases that usually interest us are the first two, and particularly the first (i.e., ξ> 0), as this corresponds to data being heavy tailed. The GPBdH theorem is a very useful result, because it tells us that the distribution of excess losses always has the same form(in the limit, as the threshold gets high), prettymuch regardless of the distribution of the losses themselves. Provided the threshold is high enough, we should therefore regard the GP distribution as the natural model for excess losses.” 125.2 The GPD CDF gives P[X-u <=x | X > u] which is F(x) = 1 - (1 + Xi*x/sigma)^(-1/Xi). In this case, F(1.6) = 1 - (1+0.12*1.6/0.9)^(-1/0.12) = 0.80 = 80%; i.e., in the 5% tail, the conditional probability that X is less than 3.6 (+1.6 over the threshold) is 80% The conditional probability that X exceeds 3.6 is therefore 20%, which implies the unconditional probability that X exceeds 3.6 is 20% * 5% = 1.0%. Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-125-generalized-pareto-distribution-gpd.4125/

Page 165: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

165

P1.T2.126. Mixtures of distributions

AIM: Explain the concept of mixtures of distributions.

126.1. If we assume (n) number of probability density functions (pdfs) and (n) positive real numbers given by alpha(1), alpha(2) ... alpha(n), which best summarizes Rachev’s concept of a mixture of distributions?

a) A distribution that characterizes the random variable that is the sum of (n) probability density functions

b) A distribution whose density itself is a probability-weighted sum of the component pdf distributions and the alpha weights sum to 1.0

c) A distribution that is piece-wise characterized by different distributions; e.g., the tail is characterized by a different density function than the body

d) A distribution that randomly is characterizes by one of the (n) probability density functions; e.g., pdf(1) in one trial, pdf(2) in another trial

126.2. The sum of several independent normal random variables is:

a) Approximately or asymptotically normal per stability (or summation stability) property and central limit theorem (CLT)

b) Approximately or asymptotically normal per location-scale invariance property c) Approximately or asymptotically normal due to properties of a multivariate normal

mixture distribution d) Heavy-tailed (kurtosis > 3.0) due to properties of a multivariate normal mixture

distribution

126.3 A mixture distribution of two component normal distributions exhibits:

a) light tails; kurtosis < = 3.0 b) normal tails; kurtosis = 3.0 c) heavy tails; kurtosis >= 3.0 d) any of the above depending on the weights

126.4 What is a necessary condition for a mixture distribution of (n) probability density functions (pdfs) and (n) weights given by alpha(1), alpha(2) ... alpha(n)?

a) The component distributions must be normal b) The component distributions must have identical expectations (mean) c) The component distributions must have identical variances d) The weights must sum to 1.0

Page 166: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

166

Answers: 126.1. B. A distribution whose density itself is a probability-weighted sum of the component pdf distributions and the alpha weights sum to 1.0

In regard to (A), this is tempting but this is just the sum of random variables; this is not a mixture distribution!

126.2 A. Approximately or asymptotically normal per stability (or summation stability) property and central limit theorem (CLT)

In regard to (C) or (D), this is just the sum of normals, not a normal mixture!

126.3 C. A normal mixture will exhibit heavy tails. 126.4 D. The weights must sum to 1.0

In regard to (A), (B) and (C), none are requirements. The mixture is flexible with respect to the component densities.

Discuss in forum here: http://www.bionicturtle.com/forum/threads/l1-t2-126-mixtures-of-distributions.4194/

Page 167: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

167

Bayesian Analysis (Miller Chapter 6) P1.T2.500. Bayes theorem (Miller Chapter 6) P1.T2.501. More Bayes Theorem (Miller Chapter 6) P1.T2.302. Bayes' Theorem (Miller) (Older set of questions)

P1.T2.500. Bayes theorem (Miller Chapter 6)

Learning Objectives: Describe Bayes’ theorem and apply this theorem in the calculation of conditional probabilities. Compare the Bayesian approach to the frequentist approach.

500.1. According to Miller, each of the following is true about Bayes' theorem and Bayesian analysis EXCEPT which is false?

a) Bayes' theorem is often described as a procedure for updating beliefs about the world when presented with new information

b) Bayes' theorem updates a prior probability with evidence (aka, likelihood) to generate a posterior probability

c) Risk management, performance analysis and stress testing are areas where we often have very little data, and where the data tends to be noisy, such the frequentist approach is superior to the Bayesian approach

d) Although the theorem itself is simple, Bayes' Theorem can be applied to a wide range of problems (e.g., it is used in everything from spam filters to machine translation and to the software that controls self-driving cars) and its application can often be quite complex

500.2. You have a portfolio of bonds, each with a 1.0% probability of default. An analyst develops a model for forecasting bond defaults, but the model is only 70.0% accurate. In other words, of the bonds that actually default, the model identifies only 70.0% of them; likewise, of the bonds that do not default, the model correctly predicts that 70% will not default. Given that the model predicts that a bond will default, what is the probability that it actually defaults? (note: this is a variation on Miller's question 6-2).

a) 1.00% b) 1.43% c) 2.30% d) 7.00%

500.3. You have a model that classifies Federal Reserve statements as either bullish or bearish. When the Fed makes a bullish announcement, you expect the market to be up 80.0% of the time. The market has 60.0% probability of being up, and a 40.0% probability of being flat or down (the only two states are up, or not up). The Fed makes bullish announcements 60.0% of the time. What is the probability that the Fed made a bearish announcement, given that the market was up? (note: this is a variation on Miller's question 6-7)

a) 20.0% b) 40.0% c) 60.0% d) 80.0%

Page 168: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

168

Answers:

500.1. C. False. According to Miller: "In risk management, performance analysis and stress testing are examples of areas where we often have very little data, and the data we do have is very noisy. These areas are likely to lend themselves to Bayesian analysis."

In regard to (A), (B) and (D), each is TRUE.

Miller: "Bayesian analysis is used in a number of fields. It is most often associated with computer science and artificial intelligence, where it is used in everything from spam filters to machine translation and to the software that controls self-driving cars. The use of Bayesian analysis in finance and risk management has grown in recent years, and will likely continue to grow ... Bayes' theorem is often described as a procedure for updating beliefs about the world when presented with new information ... " 500.2. C. 2.30% P[actual = D | model = D] = P[model = D | actual = D] * P[actual = D] / P[model = D] = = P[model = D | actual = D] * P[actual = D] / ( P[model = D | actual = D] * P[actual = D] + P[model = D | actual = not default] * P[actual = not default] ) = 70.0% * 1.0%/ [(70.0% * 1.0%) + (99.0% * 30.0% )] = 2.30% 500.3. A. 20.0% Given P[up] = 60.0%, P[bullish] = 60.0%, and P[up | bull] = 80.0%, P[bear | up] = 1 - P[bull | up] = 1 - P[up | bull] * P[bull] / P[up] = 1 - 80.0% * 60.0% / 60.0% = 1 - 80.0% = 20.0% Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-500-bayes-theorem.8320/

Page 169: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

169

P1.T2.501. More Bayes Theorem (Miller Chapter 6)

Learning Objectives: Apply Bayes’ theorem to scenarios with more than two possible outcomes.

501.1. As a risk analyst, you are asked to look at Whitetech Corporation, which has issued both equity and bonds. The bonds can either be downgraded (D), be upgraded (U), or have no change (N) in rating. The stock can either perform above the market, with unconditional probability of 60.0%, or below the market with unconditional probability of 40.0%. That is, P(A) =60.0% and P(B) = 40.0%. If the equity performs above the market, there is a 20.0% probability of a bond upgrade; P[U|A] = 20.0%. If the equity performs below the market, there is only a 10.0% probability of a bond upgrade; P[U|B] = 10.0%.

If the bond was upgraded, what is the probability that the equity finished the period below the benchmark; i.e., Prob [B|U] ?

a) 4.0% b) 10.0% c) 25.0% d) 50.0%

Page 170: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

170

501.2. As a risk analyst, you are asked to analyze the bonds of Ganztrax Corporation in the context of economic cycles. The economic cycle can be in one of three states: recession, flat or growth. The corresponding unconditional probabilities are P[R] = 25.0%, P[F] = 35.0%, and P[G] = 40.0%. If the economy is in a growth cycle, then the bond can be either upgraded, downgraded, or unchanged; the corresponding conditional probabilities are P[U|G] = 35.0%, P[N|G] = 60.0% and P[D|G] = 5.0%.

If the bond was downgraded, which is nearest to the probability that the economy is in a growth cycle; i.e., Prob[G|D] ?

a) 2.0% b) 9.5% c) 13.0% d) 15.4%

501.3. Your firm is testing a new quantitative strategy. The analyst who developed the strategy claims that there is a 60.0% probability that the strategy will generate positive returns on any given day. After 30 trading days the strategy has generated a profit 21 times, which is fully 70.0%. Assume that there are only two possible states of the world: Either the analyst is correct, or there the strategy is equally likely to gain or lose money on any given day. Your prior assumption was that these two states of the world were equally likely. Which is nearest to the probability that the analyst is right and the actual probability of positive returns for the strategy is 60%? (Please note: this question is not exam-realistic because it requires spreadsheet/software to retrieve binomial probabilities. Source: variation on Miller 6-4).

a) 60.0% b) 74.3% c) 86.1% d) 90.5%

Page 171: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

171

Answers:

501.1. C. 25.0%. Prob [B|U] = 25.0% and Prob [A|U] = 75.0%. We can use a probability matrix:

501.2. D. 15.4%. We can use a probability matrix such that P[G|D] = 2.0%/13.0% = 15.38%.

501.3. C. 86.1%. Per Bayes, P[p = 0.60 | 21+] = ( P[21+ | p = 0.60] * P[p = 0.60] ) / P[21+] = 8.228% / 9.560% = 86.062% Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-501-more-bayes-theorem-miller.8324/

Page 172: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

172

P1.T2.302. Bayes' Theorem (Miller) (Older set of questions)

AIMs: Define and calculate a conditional probability, and distinguish between conditional and unconditional probabilities. Describe Bayes’ Theorem and apply this theorem in the calculation of conditional probabilities.

302.1. There is a prior (unconditional) probability of 20.0% that the Fed will initiate Quantitative Easing 4 (QE 4). If the Fed announces QE 4, then Macro Hedge Fund will outperform the market with a 70% probability. If the Fed does not announce QE 4, there is only a 40% probability that Macro will outperform (and a 60% that Acme will under-perform; like the Fed's announcement, there are only two outcomes). If we observe that Macro outperforms the market, which is nearest to the posterior probability that the Fed announced QE 4?

a) 20.0% b) 27.9% c) 30.4% d) 41.6%

302.2. The following probability matrix displays the joint probabilities with respect to two bonds, an investment grade bond and a speculative (junk) bond:

For example, the joint probability that both bonds default is 0.060%; the joint probability that both survive is 96.030%. Consider two posterior probabilities:

I. If we have already observed that the junk bond has defaulted, what is the (posterior) probability that the investment-grade bond defaulted; i.e., Prob [i default | j default]

II. If we have already observed that the investment-grade bond has defaulted, what is the (posterior) probability that the junk bond defaulted; i.e., Prob [j default | i default]

What are these probabilities, respectively?

a) Prob[i default | j default] = 0.06% and Prob[j default | i default] = 2.79% b) Prob[i default | j default] = 1.98% and Prob[j default | i default] = 6.00% c) Prob[i default | j default] = 3.27% and Prob[j default | i default] = 4.25% d) Can't answer, we need unconditional (marginal) probabilities

Page 173: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

173

302.3. Next year the economy will experience one of three states: a downturn, stable state, or growth. The following probability matrix displays joint probabilities of a bond default and the economic state:

For example, the joint probability that the economy is stable and the bond defaults is 1.0%; the unconditional probability that the economy will be stable is 50.0% = 49.0% + 1.0%. If we observe that the bond has defaulted, what is the (posterior) probability that the economy experienced a downturn?

a) 0.60% b) 19.40% c) 26.33% d) 31.58%

Page 174: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

174

Answers: 302.1. C. 30.4% Per Bayes, P(QE 4 | Macro Outperforms) = Joint Prob (QE 4, Outperforms) / Unconditional Prob (Outperforms) = (20%*70%)/(20%*70% + 80%*40%) = 14%/46% = 30.435% 302.2. B. Prob[i default | j default] = 1.98% and Prob[j default | i default] = 6.00% Where i = investment grade and j = junk bond,

Per Bayes P[i defaults | j default] = Joint Prob [i defaults, j defaults] / Unconditional Prob [j defaults] = 0.060% / [2.970% + 0.060%] = 0.060%/3.030% = 1.9802%

Per Bayes P[j defaults | i default] = Joint Prob [i defaults, j defaults] / Unconditional Prob [i defaults] = 0.060% / [0.940% + 0.060%] = 0.060% /1.0% = 6.00%

Please note these bonds are not independent as 0.060% <> 3.0% * 1.0%, where 3.0% = unconditional Prob[j defaults]. Here’s a further explanation from the forum: In the first case, we want the probability of an investment grade default conditional on junk bond having defaulted; i.e., [i defaults | j default]

As Junk (j) defaulting is the given (i.e., "conditional on junk bond has defaulted"), we cannot end up in the Survive ROW, we must end up in the default ROW where the sum of probabilities is 2.970% + 0.060% = 3.0%. Prob [i defaults | j default] is the 0.06% (red) divided into the total row (3.0%). This is an application of Bayes.

Similarly, if the given is that instead the investment grade (i) defaults, we must end up in the default COLUMN, where the sum of probabilities is 0.940% + 0.060% = 1.0%. Prob [j defaults | i default] is the 0.06% (red) divided into the total column (1.0%).

Page 175: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

175

302.3. D. 31.58% Per Bayes P[downturn | default] = Joint Prob[downturn, default] / unconditional prob [default] = 0.60%/(0.60% + 1.0% + 0.30%) = 0.60%/1.90% = 31.58% Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-302-bayes-theorem-miller.6767/

Page 176: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

176

Hypothesis Testing & Confidence Intervals (Miller Chapter 7) P1.T2.313. MILLER'S HYPOTHESIS TESTING P1.T2.314. MILLER'S ONE- AND TWO-TAILED HYPOTHESES P1.T2.315. MILLER'S HYPOTHESIS TESTS, CONTINUED

P1.T2.313. Miller's Hypothesis Testing

AIMs: Define, calculate and interpret the mean and variance of the sample mean. Define and estimate the sample mean and sample variance. Define and construct a confidence interval. 313.1. Defaults in a large bond portfolio follow a Poisson process where the expected number of defaults each month is four (λ = 4 per month). The number of defaults that occur during a single month is denoted by d(i). Therefore, over a one-year period, a sample of twelve observations is produced: d(1), d(2), ... , d(12). The average of these twelve observations is the monthly sample mean. This sample mean naturally has an expected value of four. Which is nearest to the standard error of this monthly sample mean; i.e., the standard deviation of the sampling distribution of the mean?

a) 0.11 b) 0.33 c) 0.58 d) 4.00

313.2. A random sample of 36 observations drawn from a normal population returns a sample mean of 18.0 with sample variance of 16.0. Our hypothesis is: the population mean is 15.0 with population variance of 10.0. Which are nearest, respectively, to the test statistics of the sample mean and sample variance (given the hypothesized values, naturally)?

a) t-stat of 3.0 and chi-square stat of 44.3 b) t-stat of 4.5 and chi-square stat of 56.0 c) t-stat of 6.8 and chi-square stat of 57.6 d) t-stat of 9.1 and chi-square stat of 86.4

313.3. A random sample of 41 hedge fund returns, drawn from a normal population, returns a sample mean of +5.0% with sample standard deviation of 2.0%. What are the two-sided 95% confidence intervals for, respectively, the population mean and population standard deviation (note: this requires non-calculator lookups or calculations, which are normally beyond the actual exam's scope)?

a) mean interval = {4.37%, 5.63%} and standard deviation interval = {1.64%, 2.56%} b) mean interval = {4.05%, 5.93%} and standard deviation interval = {1.22%, 2.99%} c) mean interval = {3.70%, 6.12%} and standard deviation interval = {0.94%, 3.18%} d) mean interval = {3.22%, 7.49%} and standard deviation interval = {0.73%, 4.04%}

Page 177: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

177

Answers: 313.1. C. 0.58 A Poisson distribution has both mean and variance equal to its only parameter, lambda. In this case, the variance per month is therefore 4. The variance of the sample mean = 4/n. In this case, with 12 observations (months), the variance of the sample mean = 4/12. The standard error (standard deviation) = SQRT(4/12) = 0.5774 313.2. B. t-stat of 4.5 and chi-square stat of 56.0 If we do not know the population variance, the test of the sample mean relies on the t-statistic where the standard error (SE) = SQRT(16/36) = 4/6, and the t-statistic = ABS(18-15)/(4/6) = 4.5; With a t-stat of 4.5, we can reject the null hypothesis that the population mean is 15.0 (the two-sided p-value is 0.007% such that we can reject with any confidence of 99.993% or less) As the population is normal, the test of the sample variance relies on the chi-square value = (n-1)*(sample variance/hypothesized variance). In this case, the chi-square statistic = (36-1)*16/10 = 56.00, which follows a chi-square distribution with 35 degrees of freedom. (We could reject null with 95% confidence but we fail to reject null with 99% confidence). 313.3. A. mean interval = {4.37%, 5.63%} and standard deviation interval = {1.64%, 2.56%}

In regard to the sample mean, the critical (lookup) two-tailed t at 95% confidence with 40 degrees of freedom is 2.021, such that: Confidence interval = +5.0% +/- 2.021*2.0%/SQRT(41) = 5.0% +/- 0.631%; note, we use n = 41 to compute the standard error, not the 40 d.f.

In regard to the the sample variance, the chi-square lookup values are 24.43 and 59.34; i.e., 24.43 = CHISQ.INV(2.5%, 40) and 59.34 = CHISQ.INV.RT(97.5%,40). 95% lower bound for variance = 2.0%^2*40/59.34, such that lower bound for standard deviation = SQRT(2.0%^2*40/59.34) = 1.642%; and 95% upper bound for variance = 2.0%^2*40/24.43, such that upper bound for standard deviation = SQRT(2.0%^2*40/24.43) = 2.559%.

Discuss in forum here: http://www.bionicturtle.com/forum/threads/p1-t2-313-millers-hypothesis-testing.7108/

Page 178: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

178

P1.T2.314. Miller's one- and two-tailed hypotheses

AIMs: Define and interpret the null hypothesis and the alternative hypothesis, and calculate the test statistics. Define, interpret, and calculate the t-statistic. Differentiate between a one-tailed and a two-tailed test and explain the circumstances in which to use each test. 314.1. You are given the following sample of annual returns for a portfolio manager: -6.0%, -3.0%, -2.0%, 0.0%, 1.0%, 2.0%, 4.0%, 5.0%, 7.0%, 10.0%. The sample mean of these ten (n = 10) returns is +1.80%. The sample standard deviation is 4.850%. The sample mean is positive, but how confident are we that the population mean is positive? (note: this is a simplified version of Miller's problem 5.2, since it provides the sample mean and standard deviation, but it nevertheless does require calculations/lookup)

a) t-stat of 1.17 implies one-sided confidence of about 86.5% b) t-stat of 1.29 implies two-sided confidence of about 88.3% c) t-stat of 2.43 implies one-sided confidence of about 90.7% d) t-stat of 3.08 implies two-sided confidence of about 97.4%

314.2. A sample of 25 money market funds shows an average return of 3.0% with standard deviation also of 3.0%. Your colleague Peter conducted a significance test of the following alternative hypothesis: the true (population) average return of such funds is GREATER THAN the risk-free rate (Rf). He concludes that he can reject the null hypothesis with a confidence of 83.64%; i.e., there is a 16.36% chance (p value) that the true return is less than or equal to the risk-free rate. What is the risk-free rate, Rf? (note: this requires lookup-calculation)

a) 1.00% b) 1.90% c) 2.00% d) 2.40% e)

314.3. A random sample of twenty (n = 20) publicly-traded retailers produces a sample average price-to-earnings (P/E) ratio of 20.00 with sample standard deviation of 8.50. We are interested in testing hypothesis related to a possible population mean of 15.0. Each of the following is a valid conclusion EXCEPT which is not?

a) With 95.0% confidence, we reject a two-sided null hypothesis that the population's mean P/E ratio is 15.0

b) With 99.0% confidence, we reject a two-sided null hypothesis that the population's mean P/E ratio is 15.0

c) With 95.0% confidence, we accept a one-sided alternative hypothesis that the population's mean P/E ratio is greater than 15.0

d) With 99.0% confidence, we accept a one-sided alternative hypothesis that the population's mean P/E ratio is greater than 15.0

Page 179: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

179

Answers: 314.1. A. t-stat of 1.17 implies one-sided confidence of about 86.5% SE = 4.850%/SQRT(10) = 1.53370% and the t-stat = (1.80% - 0)/1.53370% = 1.1736. The alternative hypothesis is the one we want to prove, in this case, that the population mean is positive.

Therefore, the null is that the population mean is less than or equal to zero; i.e., one-sided. The one-side p-value = 13.5% such that we can only reject the null hypothesis with 86.5% confidence.

314.2. D. 2.40% The one-tailed t-stat that is associated with 16.36% with 24 degrees of freedom is 1.00; e.g., T.INV(16.36%, 24) = -1.00 and T.DIST(-1.00, 24 df, true = CDF) = 16.36%. Standard error (SE) of sample mean = 3.0%/SQRT(25) = 0.60%. Since t-stat = 1.0, (3.0% - Rf)/0.60% = 1.0, such that Rf = 3.0% - 0.60% = 2.40%. 314.3. B. We FAIL to reject the two-sided null at 99.0%, but each of the other statements is valid. As standard error (SE) is 8.5/SQRT(20), the t-stat = (20-15)/[8.5/SQRT(20)] = 2.631. Because, at 95% confidence, the lookup t values are 2.093 (two-tailed) and 1.729 (one-tailed), the two-sided null is rejected at both confidence levels. At 99% confidence, the lookup t values are 2.861 (two-tailed) and 2.539 (one-tailed). Therefore, we can reject the 99% one-tailed null, but we cannot reject the 99% two-tailed null hypothesis. Discuss in forum here: https://www.bionicturtle.com/forum/threads/p1-t2-314-millers-one-and-two-tailed-hypotheses.7118/

Page 180: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

180

P1.T2.315. Miller's hypothesis tests, continued

AIMs: Describe the process of selecting and constructing a null hypothesis. Interpret the results of hypothesis tests with a specific level of confidence. Describe and apply the principle of Chebyshev’s inequality

315.1. Roger collects a set of 61 daily returns over a calendar quarter for the stock of XYZ corporation. He computes the sample's daily standard deviation, which is annualized in order to generate a sample volatility of 27.0%. His null hypothesis is that the true (population) volatility is 30.0%. Can he reject the null with 95% confidence?

a) No, the test statistic is 1.59 b) No, the test statistic is 48.60 c) Yes, the test statistic is 24.03 d) Yes, the test statistic is 72.57

315.2. A fund of funds has investments in 36 hedge funds. At the end of the year, the mean return of the constituent hedge funds was 13.0%. The standard deviation of the funds' returns was 9.0%. The benchmark return for the fund of funds was 10.0%. With 95.0% confidence, can we accept the one-sided alternative hypothesis that the fund of funds exceeded its benchmark; i.e., can we reject the one-sided null hypothesis that the fund of funds' true performance is less than or equal to 10.0%? (note: variation on Miller's Question 5.2)

a) Yes, true fund performance is greater than 10.0% as computed t-stat of 2.00 exceeds lookup value of 1.690

b) Yes, true fund performance is greater than 10.0% as computed t-stat of 2.00 does not exceed lookup value of 2.030

c) No, true fund performance is not greater than 10.0% as computed t-stat of 2.00 exceeds lookup value of 1.690

d) No, true fund performance is not greater than 10.0% as computed t-stat of 2.00 does not exceed lookup value of 2.030

315.3. Among a sample of annual fund returns, the mean is 11.0% with standard deviation of 2.0%. According to Chebyshev’s inequality, what percentage of returns fall within 6.0% and 16.0%?

a) At least 60% b) At most 60% c) At least 84% d) At most 84%

Page 181: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

181

Answers: 315.1. B. No, the chi-square test statistic is 48.60 The chi-square test statistic = 60*27%^2/30%^2 = 48.60. This is within the two-sided chi-square lookup values, at 95% confidence and with 60 degrees of freedom, of ~40.5 (at 2.5%) and ~83.3 (at 97.5%; or right-sided at 2.5%), such that we fail to reject; i.e., the population variance might bee 30%^2. 315.2. A. Yes, true fund performance is greater than 10.0% as computed t-stat of 2.00 exceeds lookup value of 1.690 Standard error = 9.0%/SQRT(36) = 1.5% and the computed t-stat = (13.0% - 10.0%)/1.5% = 2.00. The one-sided lookup value with 35 degrees of freedom = T.INV(95%, 35) = 1.690. Because 2.00 > 1.690, we can reject the one-sided null. Note if the null hypothesis were two-sided, the lookup value would be 2.030 and we would not reject at 95.0%. Put another way--most efficiently--the one-sided p-value is 97.33% and the two-sided p-value is 94.67% 315.3. C. At least 84% According to Chebyshev's, we do not need to know the distribution, yes we can assert that AT LEAST (1 - 1/k^2)% of the sample falls within (k) standard deviations. In this case, 6% and 16% are 2.5 standard deviations from the mean of 11.0%; e.g., (16% - 11%)/2% = 2.5 sigma. Therefore, 1-1/2.5^2 = 84.0% such that AT LEAST 84% falls within the interval [6%, 16%] and 16% or LESS falls outside the interval, in the tails. Discuss in forum here: https://www.bionicturtle.com/forum/threads/p1-t2-315-millers-hypothesis-tests-continued.7128/

Page 182: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

182

Appendix: More Gujarati Econometrics Gujarati: Essentials of Econometrics, 3rd Edition Chapters 1-5

Gujarati.02.12: A random variable (r.v.) X has the following PDF:

2.12.a. What is the value of b? Why? 2.12.b. Find the P (X ≤ 2); prob (X ≤ 3); prob (2 ≤ X ≤ 3).

Answers: 2.12.a. For a PMF (discrete), the sum of mutually exclusive and cumulatively exhaustive probabilities must equal 1.0 (100%). This is a key property of probabilities. Tf this seems abstract, consider a six-sided die where the f(x) = P[outcome = 1 or X in set {1,2,3,4,5,6} = 1/6. The sum of the all possible f(x) must equal 1.0. So, in this case b+2b+3b+4b+5b = 15b and 15b = 1, so b = 1/15. 2.12.b. P (X ≤ 2) = 6/15; prob (X ≤ 3) =10/15; prob (2 ≤ X ≤ 3) = P (X=2) + P (X=3) = 3/15+4/15 = 7/15

Tip: in a discrete PMF, p(x) = f(x), but this is not the case in a continuous PDF. This is a common confusion. Take the normal distribution, we cannot say in the normal that p(x) = f(x). Why not?

Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-02-12.1135/

Page 183: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

183

Gujarati.02.13: The following table gives the joint probability distribution, f(X,Y), of two random variables X and Y.

a) Find the marginal (i.e., unconditional) distributions of X and Y, namely, f(X) and f(y).

b) Find the conditional PDF, f(X|Y) and f(Y|X).

Answers: a. Find the marginal (i.e., unconditional) distributions of X and Y, namely, f(X) and f(y). f(X=1) = 0.2, etc b. Find the conditional PDF, f(X|Y) and f(Y|X). F(X=1 | Y = 1) = 0.03/0.15 = 0.20 F(X=2 | Y = 1) = 0.06/0.15 = 0.40 Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-02-13.1133/

Page 184: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

184

Gujarati.03.08: An r.v. X has the following PDF:

a. Find the expected value of X. b. What is the variance and standard deviation of X? c. What is the coefficient of variation of X? d. Find the skewness and kurtosis values of X.

Answer:

E[X] = 40/15 130/15 - (40/15)^2 = 1.5556 Coefficient of variation = Standard Deviation [X]/E[X] Skew = Third Moment about Mean / Cube of Standard Deviation Kurtosis = Fourth Moment about Mean / Standard Deviation^4 Note this distribution has slight negative skew and light (or skinny) tails as kurtosis < 3 Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-03-08.1132/

Page 185: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

185

Gujarati.03.09: The following table gives the anticipated 1-year rates of return from a certain investment and their probabilities. ANTICIPATED 1-YEAR RATE OF RETURN FROM A CERTAIN INVESTMENT.

a. What is the expected rate of return from this investment? b. Find the variance and standard deviation of the rate of return. c. Find the skewness and kurtosis coefficients. d. Find the cumulative distribution function (CDF) and obtain the probability that the rate of

return is 10 percent or less.

Answers:

Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-03-09.1169/

Page 186: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

186

Gujarati.03.10: The following table gives the joint PDF of random variables X and Y, where X= the first-year rate of return (%) expected from investment A, and Y= the first-year rate of return (%) expected from investment B.

a. Find the marginal distributions of Y and X. b. Calculate the expected rate of return from investment B. c. Find the conditional distribution of Y, given X=20. d. Are X and Y independent random variables? How do you know?

Answer: Click here for spreadsheet: http://sheet.zoho.com/public/btzoho/g03-10-answer

Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-03-10.1170/

Page 187: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

187

Gujarati.03.17: Find the expected value of the following PDF:

Answer:

Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-03-17.1171/

Page 188: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

188

Gujarati.03.21: 3.21 According to Chebyshev’s inequality, what percentage of any set of data must lie within c standard deviation on either side of the mean value if (a) c = 2.5 and (b) c = 8

Answers: 3.21(a)

3.21 (b)

Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-03-21.3726/

Page 189: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

189

Gujarati.04.01: 04.01 Explain the meaning of

a. Degrees of freedom. b. Sampling distribution of an estimator. c. Standard error

04.01.d How many degrees of freedom (d.f.) attaches to a student’s t distribution? 04.01.e In the FRM assigned Gujarati, we study four sampling distributions. What are they and when are they used? 04.01.f. What is the difference between a standard error and a standard deviation? 04.01.g. What is the difference between an estimator and an estimate?

Page 190: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

190

Answers: 04.01 a. The number of independent observations available to compute an estimate, e.g., the sample mean or the sample variance. 04.01 b. The probability distribution of an estimator. 04.01 c. The standard deviation of an estimator. 04.01.d If you said (n-1), we forgive you (it’s a fine answer if you are thinking only about a sample mean), but it’s incomplete. A sample mean is characterized by a student’s t with (n-1) d.f. So, n-1 is correct for a sample mean. But what the student’s t distribution for the OLS estimators in a linear regression? In an OLS, the student’s t will lose one degree of freedom for each estimator; e.g., a two-variable regression requires the estimate of slope and intercept so it’s d.f. = n-2. A three-variable OLS estimator has d.f. = n – 3 (note that strictly, it’s not the number of variables, but rather the slope and intercept coefficients; they happen to be equal b/c both will equal the number of partial slope coefficients + 1, where the +1 is either the Y-dependent or the intercept. But the d.f. is really based on the intercept plus the number of partial slope coefficients; the Y-dependent is a variable). 04.01.e. Normal and student’s t for sample mean (i.e., versus hypothesized population mean) Chi-squared for sample variance (i.e., versus hypothesized population variance) F distribution to compare two sample variances 04.01.f. It’s merely semantic, it is fine to think of the standard error as a standard deviation: the standard error is the standard deviation of an estimator. And it’s maybe more helpful because we are typically converting an observation into unit/standard form thusly: (observation - hypothesis)/standard error 04.01.g. The estimator is the formula or recipe (e.g., sample mean = sum of observations / n) and the estimate is the value (e.g., sample mean = 5)

Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-04-01.3648/

Page 191: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

191

Gujarati.04.03: 04.03: Consider a random variable (r.v.) Z ~ N(8, 16)

a. What is the probability distribution of the sample mean X obtained from a random sample from this population?

b. Does your answer to (a) depend on the sample size? Why or why not?

c. Assuming a sample size of 25, what is the probability of obtaining X of 6?

d. Assume instead the random variable has mean of 8 but unknown variance, where instead the observed sample variance is 16. Assuming a sample size of 25, what is the probability of obtaining X of 6 or less?

e. We above used two difference distributions for the sample mean. What is their difference in the number of parameters?

f. How does the variance, skew and kurtosis of the two distributions employed here (i.e., to characterize the distribution of a sample mean) compare

Answers:

04.03 a. X ~ (8,16 / n)

04.03 b. The variance of X depends on the sample size. 04.03 c. Since X ~ N(8, 16 / 25), the probability that Z ≤ -2.5 = 0.0062 In Excel, =NORMSDIST(6,8,SQRT(16/25),TRUE = 0.62% 04.03 d. With unknown population variance, we use the studen’t t distribution: =TDIST(standard error = ABS((6-8)/sqrt(15/25)), d.f. = 25 - 1, 1 tail) =TDIST(ABS((6-8)/sqrt(15/25)), 25 - 1, 1) = 0.98% Note the student’s t has a slightly heavier tail. 04.03 e. The standard normal has zero parameters and the student’s t has one parameter (d.f.) 04.03 f. Both have skew = 0 (symmetrical) The variance of the student’s t = d.f./(d.f.-2). See http://en.wikipedia.org/wiki/Student’s_t_distribution The variance of the standard normal = 1 The excess kurtosis of the student’s t = 6/(d.f.-4) while the standard normal, by definition, has zero excess kurtosis (kurt = 3). As d.f. increases, the student’s t approximates the normal (i.e., the student’s t is “asymptotically normal”) Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-04-03.3649/

Page 192: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

192

Gujarati.04.04: 04.04 What is the difference between the t distribution and the normal distribution? When should you use the t distribution?

04.04b. What determines our choice of the degrees of freedom (d.f.)?

04.04c. Which distribution do we use (student’s t or normal) to characterize a sample mean?

04.04d. Which distribution do we use (student’s t or normal) to characterize the ordinary least squares (OLS) estimators (e.g., intercept, slope, partial slope coefficients in a multivariate regression)?

Answer: 04.04 Although both are symmetrical, the t distribution is flatter than the normal distribution. But as the degrees of freedom increase, the t distribution approximates the normal distribution. 04.04b. Gujarati (page 77 of Basic Econometrics): The term “number of degrees of freedom” means the total number of observations in the sample (= n) less the number of independent (linear) constraints or restrictions put on them. In other words, it is the number of independent observations out of a total of n observations. For example, in a univariate (two-variable) regression, before the residual sum of squares (RSS) can be computed, slope and intercept must be obtained. These two estimates therefore put two restrictions on the RSS. Therefore, there are n- 2, not n, independent observations to compute the RSS. Following this logic, in the three-variable regression RSS will have n- 3 df, and for the k-variable model it will have n- k df. The general rule is this: df = (n- number of parameters estimated). 04.04c. A normal if population variance is known; a student’s t if popluation variance is unknown (typically the case) 04.04d. The regression estimators are linear functions of, by definition, a normally distributed variable. Therefore, the estimators are characterized by the normal distribution. However, this requires our standard error to be informed (again) by a known variance. In this case, our variance (that informs the standard error of the estimator) is itself estimated; therefore, the student’s t distribution is used.

Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-04-04.3658/

Page 193: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

193

Gujarati.04.06: 04.06 True or false. For a sufficiently large d.f., the t, the chi-square, and the F distributions all approach the unit normal distribution.

Answer: 04.06 True. Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-04-06.3659/

Gujarati.04.11: 04.11 In problem 4.10 (Profits (X) in an industry consisting of 100 firms are normally distributed with a mean value of $1.5 million and a standard deviation (s.d.) of $120,000.), if 10 percent of the firms are to exceed a certain profit, what is that profit?

Answer: 04.11 Since P(Z ≤ 1.28) is about 0.10, we obtain 1.28 = (X - 1.5) / 0.12, which gives X = $1.6536 million as the required figure. Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-04-11.3660/

Page 194: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

194

Gujarati.04.15: 04.15 Continue with problem 4.14 (If X ~ N(10, 3) and Y ~ N(15, 8)) but now assume that X and Y are positively correlated with a correlation coefficient of 0.6. What is the probability distribution of:

a. X + Y b. X – Y c. 3X d. 4X + 5Y

Answers: 04.15 In answering this question, note that if W = (aX + bY), E(W) = aµx + bµy var(W) = a² var (X) + b² var (Y) + 2abρσxσy (see footnote 3, p. 80 of text) a. (X + Y) ~ N(25, 16.88). Note: σx = 1.73 and σy = 2.83) b. (X - Y) ~ n(-5, 5.12) c. 3X ~ N(30, 27) d. (4X + 5Y) ~ N(115, 365.58), approximately. Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-04-15.3661/

Page 195: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

195

Gujarati.04.17: 04.17 [source: 2009 FRM sample exam] The grade point average in an econometrics examination was normally distributed with a mean of 75. A random sample of 10 female SAT scores on the math test gave a sample variance of 85.21. Knowing that the true variance is 83.88, what is the probability of obtaining such a sample value? Which probability distribution do you use to answer this question?

04.17b. Why did we select this distribution to solve the problem?

04.17c. Why are we using (n-1) instead of (n)?

04.17d. [tough] What is the expected median sample variance (i.e., what sample variance implies a p value of 50%), what is the expected mean sample variance, and why are they different?

Page 196: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

196

Answer: 04.17 If it is assumed that the SAT scores are normally distributed with mean and variance as given, it can be shown that: (n - 1)(S^2 / sigma^2) ~ X²(n - 1); i.e., is a chi-squared variable with (n-1) degrees of freedom. In the present example, we have: X² = 9(85.21/83.88) = 9.14, which is a chi-square variable with 9 d.f. From the X² table, the probability of obtaining a chi-square of as much as 9.14 or greater is somewhere between 25% and 50%; the exact p value being 42.45%. In Excel, =CHIDIST(9.14, 9 d.f.) = 44.4% 04.17b The sum of squared normal variables follows a chi-square distribution. A variance is a weighted sum of squared from the sample mean; where, under our assumption, these observations are normally distributed. The chi-squared variable is merely a ratio of sample variance to population variance (scaled by degrees of freedom. So, the key memory point is: we use the chi-square to test sample variance (against a hypothesized population variance). Here is the analogy: Test sample mean against hypothesizes population mean: student’s t (or normal, if population variance known) Test sample variance against hypothesized population variance: chi-squared Tip: keep in mind the variance is in squared units and the chi-squared distribution characterizes the sum of squared normal variable. 04.17c. (n-1) is the degrees of freedom. In calculating the sample variance, one “independent observation is consumed” (lost) to calculate the sample mean. 04.17d. =CHIDIST(77.77/83.88, 9 d.f.) = ~ 50%. The chi-squared variable has a median near sample variance = 78. Using lookup table, we would find the cell corresponding to d.f. = 9 (row) and p-value = 0.50 (column). This lookup value = 8.34283 Such that 8.34283 = 9*(sample variance/83.88), sample variance = 8.34283*83.88/9 = 77.77 The mean of a chi-squared variable equals its degress of freedom (!). The mean value of the chi-squared variable in this case is 9, such that (as we’d expect) the expected (mean) value of the sample variance = 83.88 (i.e., the population mean). Why is median less than the mean? Because the chi-squared distribution has positive skew (skew to right) Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-04-17.3662/

Page 197: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

197

Gujarati.04.18: 04.18 The 10 economic forecasters of a random sample were asked to forecast the rate of growth of the real gross nation product (GNP) for the coming year. Suppose the probability distribution of the r.v. - forecast - is normal.

a. The probability is 0.10 that the sample variance of forecast is more than X percent of the population variance. What is the value of X?

b. If the probability is 0.95 so that the sample variance is between X and Y percent of the population variance, what will be the values of X and Y?

Answers: 04.18 a We want P[ (S² / ơ²) > X ] = 0.10. That is, P[ (n - 1) S² / ơ² > (n - 1)X ] = 0.10. From the X² table, we find that for 9 d.f., 9 X = 14.6837 or X = 1.6315. That is, the probability is 10 percent that S² will be more than 63% of the population variance. 04.18 b Following the same logic, it can be seen that: P[ (n - 1)X ≤ (n - 1)S² / ơ² ≤ (n - 1)Y ] = 0.95 Using the X² table, we find the X and Y values as 0.3000 and 2.1136, respectively. (Note: For 9 d.f., P( X² > 2.70039) = 0.975 and P( X² > 19.0228 = 0.025). Here are the answers using Excel/Zoho. Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-04-18.3663/

Page 198: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

198

Gujarati.04.20: 04.20 The same microeconomics examination was given to students at two different universities. The results were as follows:

a. Where the Xs denote the grade averages in the two samples (average grades were 70 and 75), the Ss denote the two sample variances (sample variances were 9.0 and 7.2); and the ns denote the sample sizes (sample sizes 50 and 40).

b. How would you test the hypothesis that the population variances of the test scores in the two universities are the same?

c. Which probability distribution would you use?

d. What are the assumptions underlying that distribution?

Answer: 4.20 Use the F distribution. Assuming both samples are independent and come from the normal populations and that the two population variances are the same, it can be shown that:

In this example, . The probability of obtaining an F value of 1.25 or greater is 0.2371. Here is the answer in Excel/Zoho; the function used is FDIST(higher variance/lower variance, sample - 1, sample - 1) Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-04-20.3664/

Page 199: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

199

Gujarati.05.01: 05.01 What is the distinction between each of the following pairs of terms?

a) Point estimator and interval estimator. b) Null and alternative hypotheses. c) Type I and type II errors. d) Confidence coefficient and level of significance. e) Type II error and power.

05.01f. In Basel II, bank’s that use their internal value at risk (VaR) model (IMA) must backtest the model by comparing actual trading losses to their VaR model (each loss observation that exceeds the 99% VaR is called an “exception). The backtest framework is implicitly anchored in the following null hypothesis: “the bank’s VaR model is accurate.” Which is the more costly error (Type I or Type II)? How does Basel deal with this uncertainty? 05.01g. In de Servigny’s Neyman-Person Decision rule used to classify firms into good/bad credit risk, the null hypothesis is (implicitly): the firm is going to default. Which are the Type I and Type II errors and which is more costly?

Page 200: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

200

Answers: 05.01 a. A single numerical value of a (population) parameter is known as a point estimate. An interval estimate provides a range of values that will include the true parameter with a certain degree of confidence (i.e., probability). A point estimator is a formula or rule that tells how to obtain the point estimate. 05.01 b. A null hypothesis is the maintained hypothesis which is tested against another hypotheses, called the alternative hypothesis. 05.01 c. Type I error: The error of rejecting a hypothesis when it is true. Type II error: The error of accepting (i.e., not rejecting) a false hypothesis. 05.01 d. The probability of committing a type I error is known as the level of significance (or the size of the test). One minus the probability of committing a type I error is called the confidence coefficient. 05.01 e. The probability of accepting a false hypothesis is called a type II error and (1 - prob. of type II error), that is, the probability of not committing a type II error is called the power of the test. 05.01f In Basel II, bank’s that use their internal value at risk (VaR) model (IMA) must backtest the model by comparing actual trading losses to their VaR model (each loss observation that exceeds the 99% VaR is called an “exception). The backtest framework is implicitly anchored in the following null hypothesis: “the bank’s VaR model is accurate.” Which is the more costly error (Type I or Type II)? How does Basel deal with this uncertainty? If null hypothesis: bank’s VaR model is accurate then:

Type I error (The error of rejecting a hypothesis when it is true) is to REJECT a GOOD VaR Model

Type II error (The error of accepting [not rejecting] a false hypothesis) is to ACCEPT a BAD VaR Model.

The Type II error is therefore more costly as it implies insufficient capital. Basel deals with this *unavoidable* uncertainty with the three zone (red/yellow/green) “traffic light” approach; e.g., between 5 and 9 exceptions fall into a yellow zone

Page 201: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

201

05.01g In de Servigny’s Neyman-Person Decision rule used to classify firms into good/bad credit risk, the null hypothesis is (implicitly): the firm is going to default. Which are the Type I and Type II errors and which is more costly? If null hypothesis: firm is going to default (“bad firm”)

Type I error (The error of rejecting a hypothesis when it is true) is to make a loan (accept as a credit risk) to a BAD firm

Type II error (The error of accepting [not rejecting] a false hypothesis) is deny a loan to a GOOD firm

The Type I error is worse, de Servigny: “a Type I error damages the wealth of the bank, given its utility function, whereas a Type II error is only a false alarm” Note: although Type II errors are often more costly, we cannot say this as a general rule. Rather, it depends entirely on the definition of the null hypothesis.

Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-05-01.3669/

Gujarati.05.02: 05.02 What is the meaning of:

a) Statistical inference.

b) Sampling distribution.

c) Acceptance region.

d) Test statistic.

e) Critical value of a test.

f) Level of significance.

g) The p value.

This Gujarati Chapter concerns a mini-case study: among a small sample of 28 NYSE companies, the mean price-to-earnings (P/E) ratio is 23.25. The sample standard deviation is 9.49. 05.02h. What distribution characterizes the sampling distribution of the sample mean? Why? 05.02i. Which two (or three) variables determine the size of the acceptance region? 05.03j. In Gurajati’s case, the null hypothesis is: population mean = 18.5. In which case, the test statistic is 2.65. Explain this test statistic in a single sentence that includes the phrase “standard deviation.” 05.03k. Which two variables determine the critical value? 05.03l. In order to compute a p-value, do we need (i) the test statistic, (ii) the critical value, or (iii) both? 05.03m. In this example, the p-value is about 1.3%. Use this p-value in a sentence!

Page 202: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

202

Answers: 05.02 a. The two branches of classical statistics, estimation of parameters and testing hypothesis about parameters, constitute statistical inference. 05.02 b. The probability distribution of an estimator. 05.02 c. It is synonymous with a confidence interval. 05.02 d. A statistic used to decide whether a null hypothesis is rejected or not. 05.02 e. That value of the test statistic which demarcates the acceptance region from the rejection region. 05.02 f. It is the probability of committing a type I error. 05.02 g. The exact level of significance of a test statistic. 05.02h. Per the central limit theorem, the normal characterizes the sampling variation of the sample mean (recall: the underlying does not need to be normal!). However, since we do not know the population variance, we use the student’s t (which itself is asymptotically normal) 05.02i. The sample mean anchors the acceptance region. The confidence intervals are a function of the (i) critical value and (ii) the standard error. 05.03j. In Gurajati’s case, the null hypothesis is: population mean = 18.5. In which case, the test statistic is 2.65. Explain this test statistic in a single sentence that includes the phrase “standard deviation.” This hypothesized population mean (18.5) is 2.65 standard deviations (or standard errors. They are functionally the same!) away from the observed sample mean.” 05.03k. 1. Significance (or confidence), and 2. Degrees of freedom 05.03l. We need a test statistic. For a given d.f., a test statistic (being a standard error) solves for a p-value. But we do not need a critical value; the critical value is a function of confidence. It corresponds to a p-value for a “standardized” t distribution. 05.03m. We can reject the null (i.e., we can reject the hypothesis that the population mean = 18.5) at significance of 1.3%, but no lesser significance. Or, i prefer: We can reject the null with confidence of 98.7% (1-1.3%) but with no greater confidence. Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-05-02.3670/

Page 203: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

203

Gujarati.05.03: 05.03 Explain carefully the meaning of:

a) An unbiased estimator.

b) A minimum variance estimator.

c) A best, or efficient, estimator.

d) A linear estimator.

e) A best linear unbiased estimator (BLUE).

f) Are ordinary least squares (OLS) estimators BLUE?

g) If the population is normal, is the sample median an unbiased estimator of population mean?

h) If the population is lognormal, is the sample median an unbiased estimator of population mean?

i) Which property is naturally called a “repeated sampling property?”

j) If we compute sample variance as the sum of squared deviations (from sample mean) divided by (n; where n=sample size), which property does that estimator exhibit?

k) If we compute sample variance as the sum of squared deviations (from sample mean) divided by (n-1), which property does that estimator exhibit?

Page 204: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

204

Answers: 05.03 a. If the average, or expected, value of an estimator coincides with the true value of the parameter, that estimator is known as an unbiased estimator. 05.03 b. In a group of competing estimators of a parameter the one with the least variance is called a minimum variance estimator. 05.03 c. In the class of unbiased estimators, the one with the least variance is called an efficient estimator. 05.03 d. An estimator which is a linear function of the observations. 05.03 e. An unbiased linear estimator with the least possible variance. 05.03.f. Yes, the Gauss Markov theorem demonstrates that OLS regression estimators are BLUE. see http://en.wikipedia.org/wiki/Gauss–Markov_theorem 05.03.g. Yes. 05.03.h. No: lognormal has positive skew such that mean > median. 05.03.i. Unbiasedness. 05.03.j. This is a maximum likelihood estimator (MLE) 05.03.k. This is an unbiased estimator.

Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-05-03.3671/

Page 205: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

205

Gujarati.05.04: 05.04 State whether the following statements are true, false, or uncertain. Justify your answers.

a) An estimator of a parameter is a random variable, but the parameter is non-random, or fixed.

b) An unbiased estimator of a parameter, say µx, means that it will always be equal to µx.

c) An estimator can be a minimum variance estimator without being unbiased.

d) An efficient estimator means an estimator with minimum variance.

e) An estimator can be BLUE only if its sampling distribution is normal.

f) An acceptance region and a confidence interval for any given problem means the same thing.

g) A type I error occurs when we reject the null hypothesis even though it is false.

h) A type II error occurs when we reject the null hypothesis even though it may be true.

i) As the degrees of freedom (d.f.) increase indefinitely, the t distribution approaches the normal distribution.

j) The central limit theorem states that the sample mean is always distributed normally.

k) The terms level of significance and p value mean the same thing.

Page 206: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

206

Answers: 05.04 a. True. In classical statistics the parameter is assumed to be some fixed number, although unknown. 05.04 b. False. It is E(ûx) = ûx, where ûx is an estimator. 05.04 c. True 05.04 d. False. To be efficient, an estimator must be unbiased and it must have minimum variance. 05.04 e. False. No probabilistic assumption is required for an estimator to be BLUE. 05.04 f. True. 05.04 g. False. A type I error is when we reject a true hypothesis. 05.04 h. False. A type II error occurs when we do not reject a false hypothesis. 05.04 i. True. This can be proved formally. 05.04 j. False, generally. Only when the sample size increases indefinitely, the sample mean will be normally distributed. If, however, the sample is drawn from a normal population to begin with, the sample mean is distributed normally regardless of the sample size. 05.04 k. Uncertain. The p value is the exact level of significance. If the chosen level of significance, say, a = 5%, coincides with the p value, the two will mean the same thing. Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-05-04.3672/

Page 207: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

207

Gujarati.05.09: 05.09 Assume that the per capita income of residents in a country is normally distributed with a mean μ = $1000 and variance σ² = 10,000 ($ squared).

a. What is the probability that the per capita income lies between $800 and $1200?

b. What is the probability that it exceeds $1200?

c. What is the probability that it is less than $800?

d. Is it true that the probability of per capita income exceeding $5000 is practically zero?

Answers: Spreadsheet here: http://sheet.zoho.com/public/btzoho/gujarati-5-9 05.09 a. P(-2 ≤ Z ≤ 2) = 0.9544 05.09 b. P(Z ≥ 2) = 0.0228 05.09 c. P(Z ≤ -2) = 0.0228 05.09 d. Yes (Z = 40, an extremely high value) Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-05-09.3676/

Page 208: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

208

Gujarati.05.10: 05.10 Continuing with problem 5.9, based on a random sample of 1000 members, suppose that you find the sample mean income, X to be $900.

a. Given that μ = $1000, what is the probability of obtaining such a sample mean value?

b. Based on the sample mean, establish a 95% confidence interval for μ and find out if this confidence interval includes μ = $1000. If it does not, what conclusions would you draw?

c. Using the test of significance approach, decide whether you want to accept or reject the hypothesis that μ = $1000. Which test do you use and why?

Answers: Spreadsheet here @ http://sheet.zoho.com/public/btzoho/gujarati-5-9 Note that X ~ N(1,000, σ² / n = 10) 05.10 a. Practically zero, because P(Z ≤ -31.6228) is negligible. 05.10 b. The 95% confidence interval is 893.8019 ≤ μx ≤ 906.1981. With 95% confidence we can say that the true mean is not equal to 1,000. 05.10 c. Reject the null hypothesis. Use the normal distribution because the sample size is reasonably large. Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-05-10.3677/

Page 209: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

209

Gujarati.05.13: 05.13 Based on a random sample of 10 values from a normal population with mean (mu) and standard deviation (sigma), you calculated that X = 8 and the sample standard deviation = 4. Estimate a 95% confidence interval for the population mean. Which probability distribution do you use? Why? 05.13b. Assume instead the atypical scenario in which we somehow know that the population standard deviation =4, what is the 95% confidence interval for the population mean? 05.13c. What is the 95% confidence interval for the population standard deviation?

Answer: 05.13 Use the t distribution, since the true variance is unknown. For 9 d.f., the 5% critical t value is 2.262. Therefore, the 95% CI is: 8 ± 2.262(1.2649) = (5.1388, 10.8612) 05.13b. We can use the normal. 95% two-tailed normal deviate = NORMSINV(97.5% = .5 + .5*95%) = 1.96 Standard error = SQRT [4/10] = 1.26 Confidence interval: 8 - (1.96)(1.26) < population mean < 8 + (1.96)(1.26) = 5.52 < mu < 10.48 (note: some reassurance answer is correct in the slightly tighter interval) 05.13c. We use chi-square distribution. Critical chi-square values: =CHIINV(2.5%, 9 d.f.) = 19.02 =CHIINV(97.5%, 9 d.f.) = 2.7 Lower limit = sample variance*df/critical value [@ 2.5%] = 16*9/19.02 = 7.57 [sigma^2] Upper limit = sample variance*df/critical value [@ 97.5%] = 16*9/2.7 = 53.33 [sigma^2] Confidence interval = SQRT[7.57] < sigma < SQRT [53.33] Confidence interval = 2.75 < sigma < 7.3

Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-05-13.3678/

Page 210: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

210

Gujarati.05.14: 05.14 You are told that X is approximately normally distributed X ~ (mu = mean = 8, sigma^2 = variance =36). Based on a sample of 25 observations, you find that the sample mean = 7.5.

a) What is the sampling distribution of the sample mean?

b) What is the probability of obtaining a 7.5 or less?

c) From your answer in part (b) of this problem, could such a sample value have come from the preceding population?

Answers: 05.14 a. The sample mean ~ N(8, 36/25) per the central limit theorem 05.14 b. Z = (7.5 - 8) / 1.2 = -0.4167. Therefore, P(Z <= -0.4167) = 0.3372. 05.14 c. The 95% CI for the sample mean = 8 ± 1.93(1.2) = (5.6480, 10.3520). Since this interval includes the value of 7.5, such a sample could have come from this population.

Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-05-14.3691/

Page 211: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

211

Gujarati.05.17:

05.17 Let . A random sample of three observations was obtained from this population. Consider the following estimators of μx:

a. Is û1 an unbiased estimator of μx? What about û2?

b. If both estimators are unbiased, which one would you choose? (Hint: Compare the variances of the two estimators.)

c. What does it mean for an estimator to be “unbiased?” What is the practical acid-test?

d. Which MS Excel variance function returns an unbiased sample variance; =VARP(), =VAR() or =VAR()*(n-1)/n?

e. Are OLS estimators (e.g., slope, intercept) unbiased?

f. Are BLUE estimators unbiased?

Answers: 05.17 a.

Hence it is an unbiased estimator. Similarly, it can be shown that û2 is also unbiased. 05.17 b.

05.17.c. Unbiased: the expected value (the average value) of the estimator equals the population parameter. Because unbiasedness is a “repeated sampling property,” it is proven by repeated sampling: as we increase the number of samples, the average sample estimate will converge on the population parameter (if the estimator is unbiased). (note: estimator is the recipe/formula that produces the estimate value) 05.17.d. VAR() returns the unbiased sample variance because it is the sum of squared deviations divided by (n-1) Dividing by (n) returns the population variance or the MLE estimator. 05.17.e. Yes. 05.17.f. Yes, the “U” in BLUE refers to unbiased. Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-05-17.3692/

Page 212: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

212

Gujarati.05.18: 05.18 Refer to Problem 4.10 in Chapter 4 (Profits (X) in an industry consisting of 100 firms are normally distributed with a mean value of $1.5 million and a standard deviation (s.d.) of $120,000.). Suppose a random sample of 10 firms gave a mean profit of $900,000 and a (sample) standard deviation of $100,000.

a. Establish a 95% confidence interval for the true mean profit in the industry.

b. Which probability distribution do you use? Why?

Answers: 05.18 a. 900,000 ± 2.262(100,000 / √10), that is, (828,469, 971,531) 05.18 b. The t distribution, since the true σ² is not known. Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-05-18.3679/

Page 213: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

213

Gujarati.05.19: 05.19 Refer to Example 4.14 in Chapter 4: Suppose a random sample of 20 observations from a normal population with variance (sigma^2) =8 gives a sample variance (S^2) = 16.

a. Establish a 95% confidence interval for the true σ².

b. Test the hypothesis that the true variance is 8.2.

c. Why aren’t we using a normal or student’s t distribution?

d. [tough] In this case we know the population variance, so why does d.f. = (n-1)

e. What is the mean and variance of this distribution?

Answers: 05.19 a. (19)(16) / 32.8523 ≤ σ² ≤ (19)(16) / 8.9065, that is, (9.2535, 34.1324) Note: For 19 d.f., 05.19 b. Since the preceding interval does not include σ² = 8.2, reject the null hypothesis.

05.19c. The sum of random normal equals a random variable. The FRM assigned Rachev refers to this as “summation stability” property of the normal distribution: “Another interesting and important property of normal distributions is their summation stability. If you take the sum of several independent random variables, which are all normally distributed with mean (mu) and standard deviation, then the sum will be normally distributed again.” However, a variance is the sum of squared variables (deviations from mean); as such, the sum of squared, independent normals follows a chi-square distribution. 05.19d. Because we do not have 20 fully independent observations in computing the sample variance: Each squared deviations depends on the sample mean. The sample mean “consumes” one independent observation. 05.19e. The mean of a chi-square is its d.f. (!) and its variance is 2*d.f. In this case, the mean = 19 and variance = 38.

Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-05-19.3693/

Page 214: R13.P1.T2.Miller_300_315_Chapters_2_3_4_6_7_v9.0

214

Gujarati.05.20: 05.20 Sixteen cars were first driven with a standard fuel and then with Petrocoal, a gasoline with a methanol additive. The results of the nitrous oxide emissions (NOx) test are as follows: Type of Fuel Average NOx Standard Deviation of NOx Standard 1.075 0.5796 Petrocoal 1.159 0.6134

a. How would you test the hypothesis that the two population standard deviations are the same?

b. Which test do you use?

c. What are the assumptions underlying that test?

[my challenge!] Use the p-value in a sentence.

Answers: Excel/Zoho version is here. 05.20 a.

The p value of obtaining an F value of 1.12 or greater is 0.4146. Therefore, one may not reject the null hypothesis that the two population variances are the same. 05.20 b. The F test is used. The basic assumption is that the two populations are normal. 05.20c We could say: “We can reject the null hypothesis (i.e., variances are the same) with only 58.54% (1-p value) confidence” which is equivalent to “We can reject null with exact significance, and with no less significance than, of 41.46%.” or, we can also say: “If we decide the population variances are different, we are a whopping 41.46% likely to be wrong” i.e., if we reject the null, our potential error is Type I (i.e., to mistakenly reject a true null); significance level = probability of a Type I error.

Discuss in forum here: https://www.bionicturtle.com/forum/threads/gujarati-05-20.3694/