47
FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University

FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

FSRM 16:958:587 Advanced SimulationMethods for Finance

(Lecture 4)

Min-ge Xie

Department of Statistics & Biostatistics,Rutgers University

Page 2: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

General statements (overly simplified intro/some key words)

Bootstrap method in Statistics is a resampling (simulation)approach for making statistical inference for unknownparameters (of a underlying population from which the observedsample data are generated).

Bootstrap samples are simulated “phantom" samples based onobserved sample data;Bootstrap distributions are derived from the bootstrap samplesand they can be used to make statistical inference.

� Although it’s a simulation method, it is a little different from thesimulation techniques that we’ve learned before.

Page 3: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap
Page 4: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Motivation/background

The primary task of a statistician is to summarize a samplebased study and generalize the finding to the parent(underlying) population in a scientific manner.

The summary (often through a sample statistic such asmean, median, correlation, etc) will fluctuate from sampleto sample

We would like to know the magnitude of these fluctuationsto get an overall picture — This fluctuation can often bedescribed in the form of a probability distribution called asampling distribution.

Page 5: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Motivation/background (continue) —Suppose we do not make much assumption (do not knowmuch) about the underlying population:

Ideally, if we can repeated draw samples from the targetdistribution again and again =⇒We can have multiple(many) copies of sample statistic in these repeatedlydrawing samples =⇒ The multiple (many) copies ofsample statistic can then provide us a good idea about thefluctuation and the sampling distribution.

But, in reality, we only have one set (copy) of observeddata (sample).

Page 6: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

The idea behind bootstrap is to use the observed sample data as a“surrogate population”, for the purpose of approximating the samplingdistribution of a statistic

Specifically,

— We resample with replacement from the sample data at hand andcreate a large number of “phantom samples” known as bootstrapsamples.

— These bootstrap samples can be used to quantify the fluctuation (‘makeinference’) of a "population parameter" of the “surrogate population”.

— Under some conditions, the “phantom" inference is the same as (canhelp to derive) the real inference that we are looking for.

This leads us to the "bootstrap method"!

Page 7: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

The idea behind bootstrap is to use the observed sample data as a“surrogate population”, for the purpose of approximating the samplingdistribution of a statistic

Specifically,

— We resample with replacement from the sample data at hand andcreate a large number of “phantom samples” known as bootstrapsamples.

— These bootstrap samples can be used to quantify the fluctuation (‘makeinference’) of a "population parameter" of the “surrogate population”.

— Under some conditions, the “phantom" inference is the same as (canhelp to derive) the real inference that we are looking for.

This leads us to the "bootstrap method"!

Page 8: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Read a companion note (introductory review article) by Singh and Xie

(2010) in International Encyclopedia of Education.

(http://stat.rutgers.edu/~mxie/RCPapers/bootstrap.pdf)or(http://www.sciencedirect.com/science/referenceworks/9780080448947)

Page 9: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Setting/Setup:We have a sample data set x1, . . . , xn

i.i.d∼ F (x); Also, let θbe a population characteristic of the distribution F , and wehave an estimator θ for θ, where θ = θ (x1, . . . , xn) is afunction of the sample set x = (x1, . . . , xn).

For example, θ is the mean of distribution F (populationmean) and θ = X .

Goal:We need to make an inference about θ – Beside the pointestimator θ, we like to know the sampling distribution ofθ = θ (x1, . . . , xn); in particular, find a confidence intervalfor θ etc.

Page 10: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Bootstrap (generate) a set of new (“phantom”) data

From the observed sample data set {x1, x2, . . . , xn},resample with replacement to get a new data set of size n:

— Randomly pick (each with probability 1/n) a data point from{x1, x2, . . . , xn} and set it to be x∗

1 ; repeat the exactly samerandom pick n − 1 times to get x∗

2 , x,3 . . . , x

∗n .

This new set of data {x∗1 , . . . , x

∗n} is called a set of

bootstrap sample.

� To make statistical inference, we repeat this bootstrappingsampling process a large number of (say N) times to get Nsets of bootstrap samples.

Page 11: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Bootstrap (generate) a set of new (“phantom”) data

From the observed sample data set {x1, x2, . . . , xn},resample with replacement to get a new data set of size n:

— Randomly pick (each with probability 1/n) a data point from{x1, x2, . . . , xn} and set it to be x∗

1 ; repeat the exactly samerandom pick n − 1 times to get x∗

2 , x,3 . . . , x

∗n .

This new set of data {x∗1 , . . . , x

∗n} is called a set of

bootstrap sample.

� To make statistical inference, we repeat this bootstrappingsampling process a large number of (say N) times to get Nsets of bootstrap samples.

Page 12: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

A bootstrap sampling algorithm to get a CI for θ:1 Generating N bootstrap datasets, each of size n and

compute the corresponding bootstrap estimator θ∗:1st : {x∗

1 , . . . , x∗n }[1] ∼ {x1, . . . , xn}, θ∗1 = θ

({x∗

1 , . . . , x∗n }[1]

)2nd : {x∗

1 , . . . , x∗n }[2] ∼ {x1, . . . , xn}, θ∗2 = θ

({x∗

1 , . . . , x∗n }[2]

)...

Nth: {x∗1 , . . . , x

∗n }[N] ∼ {x1, . . . , xn}, θ∗N = θ

({x∗

1 , . . . , x∗n }[N]

)2 Sort {θ∗1, θ∗2, . . . , θ∗N} from the smallest to the largest.

Now we have θ∗(1) ≤ θ∗(2) ≤ · · · ≤ θ

∗(N); a histogram could be

constructed.

Page 13: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Claim (symmetric case): A (1− α)100% confidenceInterval for the parameter θ is simply[

θ∗(L), θ∗(U)

],

where L =αN2

and U =(

1− α

2

)N, for 0 < α < 1.

For example, a 95% C.I. for parameter θ is[θ∗(25), θ

∗(975)

], if

N = 1000(Note: L = 1000× .025 = 25, U = 1000× .975 = 975)An equivalent way to write the above confidence interval issimply

[θ∗α/2, θ

∗1−α/2

]

Page 14: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Claim: (asymmetric case) A (1− α)100% confidenceInterval for the parameter θ is[

2θ − θ∗(U),2θ − θ∗(L)

],

where L =αN2

and U =(

1− α

2

)N, for 0 < α < 1.

For example, a 95% C.I. for parameter θ is just[2θ − θ∗(975),2θ − θ

∗(25)

], if N = 1000.

Page 15: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Example (cf., Example 2 of the companion note by Singh andXie (2010)):Data: Two types of measurements to assess body fat in n = 20collegiate football players

 BOD  2.5  4.0  4.1  6.2  7.1  7.0  8.3  9.2  9.3  12.0  12.2  12.6  14.2  14.4  15.1  15.2  16.3  17.1  17.9  17.9    HW  8.0  6.2  9.2  6.4  8.6  12.2  7.2  12.0  14.9  12.1  15.3  14.8  14.3  16.3  17.9  19.5  17.5  14.3  18.3  16.2  

— BOD is BOD POD, a whole body air-displacement plethysmograph

— HW refers to hydrostatic weighing.

Question: To study the correlation between the BOD and HWmeasurements (find a confidence interval)

Page 16: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

##  R  program  of  bootstrap  algorithm  for  correla3on  parameter    ##  Example  2  of  Singh  and  Xie  (2010)    #  Data  BOD  =  scan()  2.5  4.0  4.1  6.2  7.1  7.0  8.3  9.2  9.3  12.0  12.2  12.6  14.2  14.4  15.1  15.2  16.3  17.1  17.9  17.9    HW  =  scan()  8.0  6.2  9.2  6.4  8.6  12.2  7.2  12.0  14.9  12.1  15.3  14.8  14.3  16.3  17.9  19.5  17.5  14.3  18.3  16.2      data.ex2  =  cbind(BOD,  HW)    ##  Boxplot  of  the  data  and  the  scaNer  plot    par(mfrow  =  c(1,2))  boxplot(data.ex2);  plot(BOD,  HW)  

Page 17: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

BOD HW

510

1520

5 10 15

68

1012

1416

1820

BOD

HW

Page 18: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

A generic bootstrap algorithm for this example

Step 1: At each iteration k = 0, 1, 2, . . . ,N = 1000, generate a

bootstrap data set of n = 20 pairs by repeating the following procedure:

1 For i = 1, . . . ,20, randomly sample a pair (x∗i , y

∗i ) from the

20 observed data pairs {(2.5,8.0), (4.0,6.2), . . . , (17.9,16.2)} (sample with replacement);These new 20 pairs form a bootstrap sample set(x∗,y∗) = {(x∗

1 , y∗1 ), (x

∗2 , y

∗2 ), . . . , (x

∗20, y

∗20)}.

2 Compute the bootstrap sample correlation coefficientρ∗ = corr(x∗,y∗).

Step 2: Produce a histogram using the N = 1000 ρ∗’s and also sortthese ρ∗. The histogram (next page) suggests that the bootstrapdistribution is skewed; so the 95% confidence interval for ρ is[2ρ− ρ∗(975), 2ρ− ρ∗(25)].

Page 19: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

##  R  program  of  bootstrap  algorithm  for  correla3on  parameter    ##  Example  2  of  Singh  and  Xie  (2010)    #  Bootstrapping  and  calcula3on  of  bootstrap  corr  coef.    corr.b=matrix(0,1000)    for(i  in  1:1000)  {  #  sample  genera3ng  a  set  of  new  bootstrap  sample  indx  =  sample(1:nrow(data.ex2),  replace  =  T)  data.bt  =  data.ex2[indx,]    #  calculate  correla3on  coeeficient    corr.b[i]=  cor(data.bt[,1],  data.bt[,2])  }  

Page 20: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

##  Es%mate  of  corr  using  the  orginal  data  cor(data.ex2[,1],  data.ex2[,2]);  cor(BOD,  HW)  1]  0.8678753  [1]  0.8678753    #  Histogram  and  boostrap  95%  CI  hist(corr.b);summary(corr.b)    Min.      :0.6495        1st  Qu.:0.8434        Median  :0.8736        Mean      :0.8667        3rd  Qu.:0.8966        Max.      :0.9584      corr.b.srt  =  sort(corr.b)  CI.95  =  c(2*  cor(BOD,  HW)  -­‐  corr.b.srt[975],                                                                              2  *  cor(BOD,  HW)  -­‐  corr.b.srt[25]);  CI.95  [1]  0.7998790  0.9692848  

Page 21: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Histogram of corr.b

corr.b

Frequency

0.6 0.7 0.8 0.9 1.0

0100

200

300

400

500

Page 22: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Bootstrap Central Limit Theory (Singh, 1981):

Theorem: Under some mild conditions, we have when n islarge (n→∞),

(θ∗ − θ)∣∣∣∣θ ∼ (θ − θ0)

∣∣∣∣θ0. (1)

Proof: Omitted.

(Notation: The distribution (1) has a cumulative distributionfunction G(·). )

Page 23: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Based on the Bootstrap Central Limit Theory, we can show thatthe claims on pages 14-15 are justified.

Proof of the claims on page 14-15:Case (i) The distribution (1) is symmetric.

We define the cumulative distribution of the bootstrap estimatorwhen given the sample data:

Bn (t) = P(θ∗ ≤ t |θ

).

(The Bn (t) is also known as bootstrap distribution).

Page 24: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

We have the following statements:

Bn (t) is monotonically increasing in t (since it is a cumulativedistribution function).

When t = θ∗α, Bn

(θ∗α

)= P

(θ∗ ≤ θ∗α

∣∣θ) = α.

So we know that θ∗α = B−1n (α).

When t = θ0 the true parameter value, we have

Bn (θ0) = P(θ∗ ≤ θ0

∣∣∣∣θ) = P(θ∗ − θ ≤ θ0 − θ

∣∣∣∣θ)= G

(θ0 − θ

)(by G’s deinition)

= G(θ − θ0

)(by symmetry)

∼ U (0, 1) (by the theorem that we also have θ − θ0 ∼ G)

Page 25: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

So for any α, 0 < α < 1,

P{θ0 ≤ θ∗α

}= P

{θ0 ≤ B−1

n (α)}= P {Bn (θ0) ≤ α}

= P (U ≤ α) = α.

Thus,[θ∗2.5%, θ

∗97.5%

]is a 95% confidence interval for θ (with

95% confidence to cover the true θ0), because

P(θ∗2.5% ≤ θ0 ≤ θ∗97.5%

)= P

(θ0 ≤ θ∗97.5%

)− P

(θ0 ≤ θ∗2.5%

)= 97.5%− 2.5% = 95%.

Page 26: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Case (ii) The distribution (1) is not symmetric.We define

Cn (t) = P(

2θ − θ∗ ≤ t |θ).

We have the following statements:

Cn (t) is monotonically increasing in t .

When t = 2θ − θ∗α, Cn

(2θ − θ∗α

)= P

(2θ − θ∗ ≤ 2θ − θ∗α

∣∣θ)= P

(θ∗ ≥ θ∗α

∣∣θ) = 1− α. So we know that 2θ − θ∗α = C−1n (1− α).

When t = θ0 the true parameter value, we have

Cn (θ0) = P(

2θ − θ∗ ≤ θ0

∣∣∣∣θ) = P(θ∗ − θ ≥ θ − θ0

∣∣∣∣θ)= 1−G

(θ − θ0

)(by G’s deinition)

∼ U (0, 1) (by the theorem that we also have θ − θ0 ∼ G)

Page 27: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

So for any α, 0 < α < 1,

P{θ0 ≤ 2θ − θ∗α

}= P

{θ0 ≤ C−1

n (1− α)}= P {Cn (θ0) ≤ 1− α}

= P (U ≤ 1− α) = 1− α.

Thus,[2θ − θ∗97.5%, 2θ − θ∗2.5%

]is a 95% confidence interval for θ (with 95%

confidence to cover the true θ0), because

P(

2θ − θ∗97.5% ≤ θ0 ≤ 2θ − θ∗2.5%)

= P(θ0 ≤ 2θ − θ∗2.5%

)− P

(θ0 ≤ 2θ − θ∗97.5%

)= 97.5%− 2.5% = 95%.

Page 28: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Other primary applications (beside CI’s) of the bootstrapsampling method

Approximating Standard Error of a Sample Estimate —Use

seB =

[1N

N∑i=1

(θ∗i − θ

)2

] 12

to estimate the standard error se(θ).

Bias correction by bootstrap — Often, Bias(θ) =θ − θ0 ≈ O(1/n). This bias can be estimated by

BiasB =1N

N∑i=1

θ∗i − θ.

Page 29: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Let’s bootstrap Bill Gates! ... Happy "bootstrapers"

Page 30: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Application – Bootstrap method in regression models

Linear regression model

yi = β0 + β1xi + εi , for i = 1,2, . . . ,n,

where εi ∼(0, σ2).

Least square (LS) estimator

β1 =

∑ni=1 (xi − x) (yi − y)∑n

i=1 (xi − x)2 .

We want to make inference on β1.

Page 31: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

If εi ∼ N(0, σ2),

β1 ∼ N

β1, σ2

{n∑

i=1

(xi − x)2

}−1

(n − 2)s2 ∼ σ2χ2n−2.

=⇒ we can use the conventional t = β1/s (or z when n islarge) test to make inference on β1.

Alternatively, we can use bootstrap approach to makeinference for β1 (only need to assume εi ∼ (0, σ2)).

Page 32: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

If εi ∼ N(0, σ2),

β1 ∼ N

β1, σ2

{n∑

i=1

(xi − x)2

}−1

(n − 2)s2 ∼ σ2χ2n−2.

=⇒ we can use the conventional t = β1/s (or z when n islarge) test to make inference on β1.

Alternatively, we can use bootstrap approach to makeinference for β1 (only need to assume εi ∼ (0, σ2)).

Page 33: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Method 1: Resample data pairs

Resample from (xi , yi) , preserve the pairs(x1, y1) (x∗

1 , y∗1 )

(x2, y2) (x∗2 , y

∗2 )

...bootstrap=⇒

...

(xn, yn) (x∗n , y

∗n )

⇓ ⇓

β1 β∗1

Repeat N times to get N copies of β∗1 ’s.

Based on these N copies of β∗1 ’s, we can make inferenceabout β1 (compute confidence intervals, making tests, etc).

Page 34: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Method 2: Resample residuals

Based on the sample data {(x1, y1) , (x2, y2) , · · · , (xn, yn)},we can obtain LS estimates β0 and β1. Also, compute theresiduals {e1, . . . ,en}.

Resample from the residual set {e1, . . . ,en} to obtainbootstrap residuals {e∗

1, . . . ,e∗n}.

Define y∗i = β0 + β1xi + e∗

i , for i = 1, . . . ,n, so that we havea bootstrap data set:

{(x1, y∗

1),(x2, y∗

2), · · · , (xn, y∗

n )}

.

Based on this bootstrap data set, we can get a bootstrapestimate β∗1.

Page 35: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Method 2: Resample residuals (continue)

Repeat the last bullet step N times to get N copies of β∗1 ’s.

Based on these N copies of β∗1 ’s, we can make inferenceabout β1 (compute confidence intervals, making tests, etc).

Page 36: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

###  Example:  Bootstrap  method  for  regression    ##  Anne7e  Dobson  (1990)  "An  IntroducCon  to    ##  Generalized  Linear  Models".  ##  Page  9:  Plant  Weight  Data.  ctl  <-­‐  c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)  trt  <-­‐  c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)  group  <-­‐  gl(2,  10,  20,  labels  =  c("Ctl","Trt"))  weight  <-­‐  c(ctl,  trt)  mydata  <-­‐  data.frame(weight,  group)    ##  Linear  regression:  lm.D9  <-­‐  lm(weight  ~  group,  data  =  mydata)  

Page 37: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

>    summary(lm.D9)    ##  Parameter  es3mate  Coefficients:                          Es3mate  Std.  Error  t  value  Pr(>|t|)          (Intercept)      5.0320          0.2202    22.850  9.55e-­‐15  ***  groupTrt          -­‐0.3710          0.3114    -­‐1.191        0.249            ##    95%  Confidence  Intervals:  >  confint(lm.D9,  "groupTrt")                                            2.5  %        97.5  %  groupTrt    -­‐1.0253  0.2833003    

Page 38: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

##  Bootstrap  Methods:    ##  Func3on  for  method  1:  boot.meth1  <-­‐  func3on(data  =  mydata,  indices){  

 data  <-­‐  data[indices,]  #  select  obs.  in  bootstrap  sample    mod  <-­‐  lm(formula  =  weight  ~  group,  data=data)    coefficients(mod)  #  return  coefficient  vector  

}      ##  Func3on  for  method  2:  boot.meth2  <-­‐  func3on(data  =  mydata,  indices,  fit  =  lm.D9){      weight.boot  <-­‐  fiOed(lm.D9)  +  residuals(lm.D9)[indices]      data.star  <-­‐  data;  data.star[,1]  <-­‐  weight.boot      mod  <-­‐  lm(weight  ~  group,  data  =  data.star)  

 coefficients(mod)  #  return  coefficient  vector  }  

Page 39: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

>  ####  Use  my  own  code  to  Run  boostrap    >  ##  Boostrap  sample  size  5000  >  b1.vec  =  b2.vec  =  rep(0,  5000)  >  for  (ii  in  1:5000)  {  +  b.indx  =  sample(1:nrow(mydata),  replace  =  TRUE)  +  b1.vec[ii]  =  boot.meth1(mydata,  b.indx)["groupTrt"]  +  b2.vec[ii]  =  boot.meth2(mydata,  b.indx,  lm.D9)["groupTrt”]}  >  ##  ##  Confidence  intervals  >  b1.vec  =  sort(b1.vec);  b2.vec  =  sort(b2.vec)  >  c(low  =  b1.vec[125],  up  =  b1.vec[4875]);                  low                  up    -­‐0.9650505    0.2376923    >  c(low  =  b2.vec[125],  up  =  b2.vec[4875]);          low            up    -­‐0.9632    0.1942    >  ##  histograms  >  par(mfrow  =  c(1,2));  hist(b1.vec);  hist(b2.vec)  

Page 40: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Histogram of b1.vec

b1.vec

Frequency

-1.5 -1.0 -0.5 0.0 0.5

0200

400

600

800

1200

Histogram of b2.vec

b2.vec

Frequency

-1.0 -0.5 0.0 0.5

0200

400

600

800

1200

Page 41: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

>  ##  Run  bootstrap  through  the  R's  "boot"  func5on:    >  library(boot)  >  out.boot.meth1  <-­‐  boot(mydata,  boot.meth1,  5000)  >    out.boot.meth1    Bootstrap  Sta5s5cs  :          original              bias        std.  error  t1*        5.032  -­‐0.002156102      0.1769029  t2*      -­‐0.371    0.005725787      0.3045753    >  boot.ci(out.boot.meth1,  index=2,  type=c("norm",  "perc"))  BOOTSTRAP  CONFIDENCE  INTERVAL  CALCULATIONS  Based  on  5000  bootstrap  replicates    Intervals  :    Level            Normal                          Percen5le            95%      (-­‐0.9737,    0.2202  )      (-­‐0.9492,    0.2200  )      Calcula5ons  and  Intervals  on  Original  Scale  

Page 42: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

>  out.boot.meth2  <-­‐  boot(mydata,  boot.meth2,  5000)  >  out.boot.meth2    Bootstrap  Sta;s;cs  :          original            bias        std.  error  t1*        5.032    0.00198706      0.2065991  t2*      -­‐0.371  -­‐0.00548282      0.2957972    >  boot.ci(out.boot.meth2,  index=2,  type=c("norm",  "perc"))  BOOTSTRAP  CONFIDENCE  INTERVAL  CALCULATIONS  Based  on  5000  bootstrap  replicates    Intervals  :    Level                      Normal                          Percen;le            95%      (-­‐0.9453,    0.2142  )      (-­‐0.9500,    0.2023  )      Calcula;ons  and  Intervals  on  Original  Scale  

Page 43: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

> plot(out.boot.meth1, index = 2)

Histogram of t

t*

Density

-1.5 -1.0 -0.5 0.0 0.5 1.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

-3 -2 -1 0 1 2 3

-1.0

-0.5

0.0

0.5

Quantiles of Standard Normal

t*

Page 44: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

> plot(out.boot.meth2, index = 2)

Histogram of t

t*

Density

-1.5 -1.0 -0.5 0.0 0.5

0.0

0.5

1.0

1.5

-3 -2 -1 0 1 2 3

-1.5

-1.0

-0.5

0.0

0.5

Quantiles of Standard Normal

t*

Page 45: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Further remarks on bootstrap estimation

We introduced the bootstrap approach and illustrated itusing some basic and regression examples. Themethodology is very broad and can be used in manyapplications.

– It is a simulation based method, one may not get exactlythe same numerical answer when repeating the same code.(A common practical solution to this problem: fix randomseed at the beginning.)

Page 46: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Bootstrap – a Simulation & Resampling Method

Further remarks on bootstrap estimation (continue...)

Most bootstrap methods are developed to studyindependent observations. When used to studycorrelations or dependent data, the key is to preserve thecorrelation/dependence.

– For example, in our examples on correlation coefficientsand regressions, we have tried to preserve thecorrelation/dependence.

– For dependent samples (for examples, time series models,Brownian motion or other stochastic processes), a usefulscheme of moving-block bootstrap. [Self study material -

(http://www2.econ.iastate.edu/classes/econ674/

bunzel/documents/DepBootstrap.pdf)]

Page 47: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap

Good night!