58
Bradley Efron Motivation Bootstrap Smoothing Results Estimation and Accuracy after Model Selection by Bradley Efron (Stanford) Sahir Rai Bhatnagar McGill University [email protected] April 7, 2014 1 / 43

Estimation and Accuracy after Model Selection

Embed Size (px)

Citation preview

Page 1: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

Estimation and Accuracy after Model Selection byBradley Efron (Stanford)

Sahir Rai Bhatnagar

McGill University

[email protected]

April 7, 2014

1 / 43

Page 2: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

Who?AchievementsSome Quotes

Born in St. Paul, Minnesotain 1938 to Jewish-Russianimmigrants

B.S., Mathematics Caltech(1960)

Ph.D., Statistics (1964)under the direction ofRupert Miller and HerbSolomon

Professor of Statistics atStanford for the past 50years

2 / 43

Page 3: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

Who?AchievementsSome Quotes

Born in St. Paul, Minnesotain 1938 to Jewish-Russianimmigrants

B.S., Mathematics Caltech(1960)

Ph.D., Statistics (1964)under the direction ofRupert Miller and HerbSolomon

Professor of Statistics atStanford for the past 50years

2 / 43

Page 4: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

Who?AchievementsSome Quotes

Best known for theBootstrap, Annals ofStatistics (1977)

Founding Editor Annals ofApplied Statistics

Awarded Guy Medal in Goldfrom RSS (2014) (34awarded since 1892 includingRao, Cox, Fisher, Nelder)

3 / 43

Page 5: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

Who?AchievementsSome Quotes

National Medal of Science 2005

Established by Congress in 1959 and administered by the NationalScience Foundation, the medal is the nation’s highest scientifichonour

4 / 43

Page 6: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

Who?AchievementsSome Quotes

“Statistics is the science of informationgathering, especially when the informationarrives in little pieces instead of big ones”

“Statistics did not come naturally to me.Dads keeping score for the baseball leaguehelped a lot”

“I spent the first year at Stanford in theMath Department...After, I started takingstats courses, which I thought would beeasy. In fact I found them harder”

5 / 43

Page 7: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

Who?AchievementsSome Quotes

“Statistics is the science of informationgathering, especially when the informationarrives in little pieces instead of big ones”

“Statistics did not come naturally to me.Dads keeping score for the baseball leaguehelped a lot”

“I spent the first year at Stanford in theMath Department...After, I started takingstats courses, which I thought would beeasy. In fact I found them harder”

5 / 43

Page 8: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

Who?AchievementsSome Quotes

“Statistics is the science of informationgathering, especially when the informationarrives in little pieces instead of big ones”

“Statistics did not come naturally to me.Dads keeping score for the baseball leaguehelped a lot”

“I spent the first year at Stanford in theMath Department...After, I started takingstats courses, which I thought would beeasy. In fact I found them harder”

5 / 43

Page 9: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

6 / 43

Page 10: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

7 / 43

Page 11: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

Look at the data: one response, many covariates

Identify list of candidate models M2p submodelslinear, quadratic, cubic . . .

Perform Model Selection (see Abbas class notes)

Do inference based on chosen model

PredictionConfidence Intervals

Today’s Question: Should we care about the variability of thevariable selection step in our post-selection inference?

8 / 43

Page 12: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

Look at the data: one response, many covariates

Identify list of candidate models M2p submodelslinear, quadratic, cubic . . .

Perform Model Selection (see Abbas class notes)

Do inference based on chosen model

PredictionConfidence Intervals

Today’s Question: Should we care about the variability of thevariable selection step in our post-selection inference?

8 / 43

Page 13: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

Look at the data: one response, many covariates

Identify list of candidate models M2p submodelslinear, quadratic, cubic . . .

Perform Model Selection (see Abbas class notes)

Do inference based on chosen model

PredictionConfidence Intervals

Today’s Question: Should we care about the variability of thevariable selection step in our post-selection inference?

8 / 43

Page 14: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

Look at the data: one response, many covariates

Identify list of candidate models M2p submodelslinear, quadratic, cubic . . .

Perform Model Selection (see Abbas class notes)

Do inference based on chosen model

PredictionConfidence Intervals

Today’s Question: Should we care about the variability of thevariable selection step in our post-selection inference?

8 / 43

Page 15: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

Look at the data: one response, many covariates

Identify list of candidate models M2p submodelslinear, quadratic, cubic . . .

Perform Model Selection (see Abbas class notes)

Do inference based on chosen model

PredictionConfidence Intervals

Today’s Question: Should we care about the variability of thevariable selection step in our post-selection inference?

8 / 43

Page 16: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

An Example

n = 164 men took Cholestyramine (meant to reducecholesterol in the blood) for 7 years

x : a compliance measure adjusted : x ∼ N (0, 1)

y : cholesterol decrease

Perform a regression of y on x

We want to predict cholesterol decrease for a givencompliance value

µ = E [y |x ]

9 / 43

Page 17: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

An Example

Multiple Linear Regression Model

Y = Xβ + ε, εi ∼ N (0, σ2)

6 candidate models: M = {linear , quadratic, . . . , sextic , } e.g.

y = β0 + β1x + β2x2 + . . .+ β6x6 + ε

Cp Criterion for Model Selection

Cp(M) =SSres(M)

n︸ ︷︷ ︸goodness of fit

+2σ2pM

n︸ ︷︷ ︸complexity

Use OLS estimate for β from chosen model and predict:

µ = Xβ

10 / 43

Page 18: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

An Example: Nonparametric Bootstrap Analysis

Bootstrap the data:

data∗ = {(xj , yj)∗, j = 1, . . . , n}

where (xj , yj)∗ are drawn randomly with replacement from the

original data

data∗ →Cp

M∗ →OLS

β∗M∗ → µ∗ = XM∗ β∗M∗

Repeat B = 4000 times

11 / 43

Page 19: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

Reproduced from Efron 201312 / 43

Page 20: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

Reproduced from Efron 2013

13 / 43

Page 21: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

Prostate Data

Examine relation between level of PSA and clinical measures

n = 97 men who were about to receive prostatectomy

x = (x1, . . . , x8): clinical measures (adjusted : x ∼ N (0, 1))

y = logPSA

Perform regression of y on x

8 candidate models were identified using regsubsets andnbest=1

We want to estimate

µj = E [y |xj ] , j = 1, . . . , 97

14 / 43

Page 22: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

original estimate = 3.6 based on Cp chosen model

0

100

200

2 3 4

fitted value µ95

coun

tFitted values for subject 95, from B=4000 nonparametric bootsrap replications

of the Cp chosen model; 60% of the replications greater than the original estimate 3.6

15 / 43

Page 23: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

original estimate = 3.6

based on Cp chosen model

18%

22%

24%

0

30

60

90

120

3 4

fitted value µ95

coun

t

model

m3

m5

m7

Fitted values for subject 95, from B=4000 nonparametric bootsrap replications separated by three most frequently chosen models by Cp

16 / 43

Page 24: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

●●●

●●

●●

1% 18% 12% 22% 15% 24% 8%

Model 7

******

3

4

m2 m3 m4 m5 m6 m7 m8model

fitted

value

µ 95

Boxplot of fitted values for Subject 95 for the model chosen by Cp criteria based on B=4000 nonparametric bootsrap samples

17 / 43

Page 25: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

Questions

Are you convinced there is a problem in the way we dopost-selection inference?

Is the juice worth the squeeze ?

18 / 43

Page 26: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example

Questions

Are you convinced there is a problem in the way we dopost-selection inference?

Is the juice worth the squeeze ?

18 / 43

Page 27: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

IdeaStandard ErrorsTheoremConfidence Intervals

Bagging (Breiman 1996)

Replace original estimator µ = t(y) with bootstrap average

µ = s(y) =1

B

B∑i=1

t(y∗i )

y∗i : i th bootstrap sample

Known as model averaging

“If perturbing the learning set can cause significant changes inthe predictor constructed, then bagging can improve accuracy”

19 / 43

Page 28: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

IdeaStandard ErrorsTheoremConfidence Intervals

Bagging (Breiman 1996)

Replace original estimator µ = t(y) with bootstrap average

µ = s(y) =1

B

B∑i=1

t(y∗i )

y∗i : i th bootstrap sample

Known as model averaging

“If perturbing the learning set can cause significant changes inthe predictor constructed, then bagging can improve accuracy”

19 / 43

Page 29: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

IdeaStandard ErrorsTheoremConfidence Intervals

Main Contribution of this Paper

t∗i = t(y∗i ), i = 1, . . . ,B (value of statistic in boot sample i)

Y ∗ij =# of times j th data point appears in i th boot sample

covj = cov(Y ∗ij , t∗i )

The non-parametric estimate of standard deviation for the ideal

smoothed bootstrap statistic µ = s(y) = B−1B∑i=1

t(y∗i ) is

sd =

n∑j=1

cov2j

1/2

20 / 43

Page 30: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

IdeaStandard ErrorsTheoremConfidence Intervals

Main Contribution of this Paper

t∗i = t(y∗i ), i = 1, . . . ,B (value of statistic in boot sample i)

Y ∗ij =# of times j th data point appears in i th boot sample

covj = cov(Y ∗ij , t∗i )

The non-parametric estimate of standard deviation for the ideal

smoothed bootstrap statistic µ = s(y) = B−1B∑i=1

t(y∗i ) is

sd =

n∑j=1

cov2j

1/2

20 / 43

Page 31: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

IdeaStandard ErrorsTheoremConfidence Intervals

Main Contribution of this Paper

t∗i = t(y∗i ), i = 1, . . . ,B (value of statistic in boot sample i)

Y ∗ij =# of times j th data point appears in i th boot sample

covj = cov(Y ∗ij , t∗i )

The non-parametric estimate of standard deviation for the ideal

smoothed bootstrap statistic µ = s(y) = B−1B∑i=1

t(y∗i ) is

sd =

n∑j=1

cov2j

1/2

20 / 43

Page 32: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

IdeaStandard ErrorsTheoremConfidence Intervals

Main Contribution of this Paper

t∗i = t(y∗i ), i = 1, . . . ,B (value of statistic in boot sample i)

Y ∗ij =# of times j th data point appears in i th boot sample

covj = cov(Y ∗ij , t∗i )

The non-parametric estimate of standard deviation for the ideal

smoothed bootstrap statistic µ = s(y) = B−1B∑i=1

t(y∗i ) is

sd =

n∑j=1

cov2j

1/2

20 / 43

Page 33: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

IdeaStandard ErrorsTheoremConfidence Intervals

Main Contribution of this Paper

t∗i = t(y∗i ), i = 1, . . . ,B (value of statistic in boot sample i)

Y ∗ij =# of times j th data point appears in i th boot sample

covj = cov(Y ∗ij , t∗i )

The non-parametric estimate of standard deviation for the ideal

smoothed bootstrap statistic µ = s(y) = B−1B∑i=1

t(y∗i ) is

sd =

n∑j=1

cov2j

1/2

20 / 43

Page 34: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

IdeaStandard ErrorsTheoremConfidence Intervals

Main Contribution of this Paper

Note that covj = cov(Y ∗ij , t∗i ) is an unknown quantity. Therefore

we must estimate it. The estimate of standard deviation forµ = s(y) in the non-ideal case is

sdB =

n∑j=1

cov2j

1/2

cov j = B−1B∑i=1

(Y ∗ij − Y ∗·j

)(t∗i − t∗· )

Y ∗·j = B−1B∑i=1

Y ∗ij t∗· = B−1B∑i=1

t∗i

21 / 43

Page 35: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

IdeaStandard ErrorsTheoremConfidence Intervals

Improvement on Traditional Standard Error

sdB =

n∑j=1

cov2j

1/2

is always less than the bootstrap estimate of standard deviation forthe unsmoothed statistic

sdB =

n∑j=1

(t∗i − t∗· )2

1/2

22 / 43

Page 36: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

IdeaStandard ErrorsTheoremConfidence Intervals

Three Types

1 Standardµ± 1.96sdB

2 Percentile [µ∗(0.025), µ∗(0.975)

]

3 Smoothedµ± 1.96sdB

23 / 43

Page 37: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

IdeaStandard ErrorsTheoremConfidence Intervals

Three Types

1 Standardµ± 1.96sdB

2 Percentile [µ∗(0.025), µ∗(0.975)

]

3 Smoothedµ± 1.96sdB

23 / 43

Page 38: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

IdeaStandard ErrorsTheoremConfidence Intervals

Three Types

1 Standardµ± 1.96sdB

2 Percentile [µ∗(0.025), µ∗(0.975)

]

3 Smoothedµ± 1.96sdB

23 / 43

Page 39: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

SettingProstate Data: RevisitedParametric BootstrapDiscussion

L1-Norm Penalty Functions

Recall the optimization problem of interest:

maxβ

`n(β)− n

p∑j=1

p(|βj |;λ)

24 / 43

Page 40: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

SettingProstate Data: RevisitedParametric BootstrapDiscussion

LASSO, SCAD and MCP penalties

LASSO (Tibshirani, 1996)

p(|β|;λ) = λ|β|

SCAD (Fan and Li, 2001 )

p′(|β|;λ, γ) = λsign(β)

{I(|β|≤λ) +

(γλ− |β|)+(γ − 1)λ

I(|β|>λ)

}, γ > 2

MCP (Zhang, 2010)

p(|β|;λ, γ) =

{λ|β| − |β|

2

2γ |β| ≤ γλγλ2

2 |β| > γλ

25 / 43

Page 41: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

SettingProstate Data: RevisitedParametric BootstrapDiscussion

Software

Analysis was performed in R

Implement LASSO using the glmnet package (Friedman,Hastie, Tibshirani, 2013)

SCAD and MCP using the coordinate descent algorithm(Breheny and Huang, 2011) in the ncvreg package

BIC and Cp model selection using the leaps package(Lumley, 2009)

26 / 43

Page 42: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

SettingProstate Data: RevisitedParametric BootstrapDiscussion

MCP SCAD LASSO

0

50

100

150

200

2 3 4 2 3 4 2 3 4

fitted value µ95

coun

tFitted values for subject 95, from B=4000 nonparametric bootsrap replications

27 / 43

Page 43: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

SettingProstate Data: RevisitedParametric BootstrapDiscussion

BIC Cp

0

100

200

300

−10 0 10 −10 0 10

fitted value µ95

coun

tFitted values for subject 95, from B=4000 nonparametric bootsrap replications

28 / 43

Page 44: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

SettingProstate Data: RevisitedParametric BootstrapDiscussion

SCAD, MCP, LASSO

LASSO

SCAD

MCP

0.0 0.5 1.0length

pena

lty

type

standard

quantile

smooth

95% Confidence Intervals for fitted value of Subject 95 based on B=4000 nonparametric bootsrap samples for MCP, SCAD and LASSO penalties

29 / 43

Page 45: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

SettingProstate Data: RevisitedParametric BootstrapDiscussion

BIC and Cp

BIC

Cp

0 5 10 15 20length

pena

lty

type

standard

quantile

smooth

Length of 95% Confidence Intervals for fitted value of Subject 95 based on B=4000 nonparametric bootsrap samples for Cp and BIC

30 / 43

Page 46: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

SettingProstate Data: RevisitedParametric BootstrapDiscussion

Table : Prostate data, B=4000, Observation 95

model type fitted value sd length coverage

LASSO standard 3.62 0.31 1.21 0.94quantile 1.20 0.95smooth 3.57 0.29 1.14 0.93

SCAD standard 3.60 0.35 1.37 0.95quantile 1.33 0.95smooth 3.62 0.33 1.28 0.93

MCP standard 3.60 0.35 1.38 0.96quantile 1.35 0.95smooth 3.61 0.33 1.29 0.94

BIC standard 5.50 4.75 18.62 0.84quantile 16.05 0.95smooth 3.22 3.46 13.55 0.83

Cp standard 5.13 5.11 20.02 0.86quantile 16.15 0.95smooth 0.64 4.40 17.24 0.97 31 / 43

Page 47: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

SettingProstate Data: RevisitedParametric BootstrapDiscussion

An Example: Parametric Bootstrap Analysis

Obtain OLS estimates µOLS based on full model

Generatey∗ ∼ N (µOLS , I)

Full Model Bootstrap

y∗ →Cp

M∗, β∗M∗ → µ∗ = XM∗ β∗M∗

Repeat B = 4000 times → t∗ij = µ∗ijSmoothed Estimates

sj = B−1B∑i=1

t∗ij

32 / 43

Page 48: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

SettingProstate Data: RevisitedParametric BootstrapDiscussion

original estimate = 3.6

based on Cp chosen model

0

50

100

150

200

1 2 3 4 5

fitted value µ95

coun

tFitted values for subject 95, from B=4000 Parametric bootsrap replications

of the Cp chosen model; 53% of the replications greater than the original estimate 3.6

33 / 43

Page 49: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

SettingProstate Data: RevisitedParametric BootstrapDiscussion

original estimate = 3.6

based on Cp chosen model

0

10

20

30

1 2 3 4 5

fitted value µ95

coun

t

model

m6

m7

m8

Fitted values for subject 95, from B=4000 Parametric bootsrap replications separated by three most frequently chosen models by Cp

34 / 43

Page 50: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

SettingProstate Data: RevisitedParametric BootstrapDiscussion

●●

●●

●●●

●●

●●

●●

●3% 6% 16% 12% 13% 14% 17% 19%

Model 8

******

1

2

3

4

5

m1 m2 m3 m4 m5 m6 m7 m8model

fitted

value

µ 95

Boxplot of fitted values for Subject 95 for the model chosen by Cp criteria based on B=4000 Parametric bootsrap samples

35 / 43

Page 51: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

SettingProstate Data: RevisitedParametric BootstrapDiscussion

original estimate = 3.7

based on BIC chosen model

0

50

100

150

200

2 3 4 5

fitted value µ95

coun

tFitted values for subject 95, from B=4000 Parametric bootsrap replications

of the BIC chosen model; 40% of the replications greater than the original estimate 3.7

36 / 43

Page 52: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

SettingProstate Data: RevisitedParametric BootstrapDiscussion

original estimate = 3.7

based on BIC chosen model

27%

20%

18%

0

20

40

60

80

3 4 5

fitted value µ95

coun

t

model

m1

m2

m3

Fitted values for subject 95, from B=4000 Parametric bootsrap replications separated by three most frequently chosen models by BIC

37 / 43

Page 53: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

SettingProstate Data: RevisitedParametric BootstrapDiscussion

●●

●●●

●●

●●

●●

20% 18% 27% 13% 9% 5% 5% 3%

Model 3

******

2

3

4

5

m1 m2 m3 m4 m5 m6 m7 m8model

fitted

value

µ 95

Boxplot of fitted values for Subject 95 for the model chosen by BIC criteria based on B=4000 Parametric bootsrap samples

38 / 43

Page 54: Estimation and Accuracy after Model Selection

Bradley EfronMotivation

Bootstrap SmoothingResults

SettingProstate Data: RevisitedParametric BootstrapDiscussion

Improvements for regularized procedures where tuningparameters are also chosen in a data-driven fashion

GLM ?

Why parametric bootstrap?

39 / 43

Page 55: Estimation and Accuracy after Model Selection

Family

Page 56: Estimation and Accuracy after Model Selection

Roots

Page 57: Estimation and Accuracy after Model Selection

What I have done so far

1 BSc Actuarial Math - Concordia (2005-2008)

2 Pension actuary (2008-2011)

3 RA at the Chest with Andrea Benedetti (2011-2012)

4 MSc Biostats - Queen’s (2012-2013)

Page 58: Estimation and Accuracy after Model Selection

What’s Next?

1 PhD Biostatistics - McGill (2013-???)

2 Supervisor Celia Greenwood (Statistical Genetics)