Upload
sahirbhatnagar
View
124
Download
2
Tags:
Embed Size (px)
Citation preview
Bradley EfronMotivation
Bootstrap SmoothingResults
Estimation and Accuracy after Model Selection byBradley Efron (Stanford)
Sahir Rai Bhatnagar
McGill University
April 7, 2014
1 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
Who?AchievementsSome Quotes
Born in St. Paul, Minnesotain 1938 to Jewish-Russianimmigrants
B.S., Mathematics Caltech(1960)
Ph.D., Statistics (1964)under the direction ofRupert Miller and HerbSolomon
Professor of Statistics atStanford for the past 50years
2 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
Who?AchievementsSome Quotes
Born in St. Paul, Minnesotain 1938 to Jewish-Russianimmigrants
B.S., Mathematics Caltech(1960)
Ph.D., Statistics (1964)under the direction ofRupert Miller and HerbSolomon
Professor of Statistics atStanford for the past 50years
2 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
Who?AchievementsSome Quotes
Best known for theBootstrap, Annals ofStatistics (1977)
Founding Editor Annals ofApplied Statistics
Awarded Guy Medal in Goldfrom RSS (2014) (34awarded since 1892 includingRao, Cox, Fisher, Nelder)
3 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
Who?AchievementsSome Quotes
National Medal of Science 2005
Established by Congress in 1959 and administered by the NationalScience Foundation, the medal is the nation’s highest scientifichonour
4 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
Who?AchievementsSome Quotes
“Statistics is the science of informationgathering, especially when the informationarrives in little pieces instead of big ones”
“Statistics did not come naturally to me.Dads keeping score for the baseball leaguehelped a lot”
“I spent the first year at Stanford in theMath Department...After, I started takingstats courses, which I thought would beeasy. In fact I found them harder”
5 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
Who?AchievementsSome Quotes
“Statistics is the science of informationgathering, especially when the informationarrives in little pieces instead of big ones”
“Statistics did not come naturally to me.Dads keeping score for the baseball leaguehelped a lot”
“I spent the first year at Stanford in theMath Department...After, I started takingstats courses, which I thought would beeasy. In fact I found them harder”
5 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
Who?AchievementsSome Quotes
“Statistics is the science of informationgathering, especially when the informationarrives in little pieces instead of big ones”
“Statistics did not come naturally to me.Dads keeping score for the baseball leaguehelped a lot”
“I spent the first year at Stanford in theMath Department...After, I started takingstats courses, which I thought would beeasy. In fact I found them harder”
5 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
6 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
7 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M2p submodelslinear, quadratic, cubic . . .
Perform Model Selection (see Abbas class notes)
Do inference based on chosen model
PredictionConfidence Intervals
Today’s Question: Should we care about the variability of thevariable selection step in our post-selection inference?
8 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M2p submodelslinear, quadratic, cubic . . .
Perform Model Selection (see Abbas class notes)
Do inference based on chosen model
PredictionConfidence Intervals
Today’s Question: Should we care about the variability of thevariable selection step in our post-selection inference?
8 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M2p submodelslinear, quadratic, cubic . . .
Perform Model Selection (see Abbas class notes)
Do inference based on chosen model
PredictionConfidence Intervals
Today’s Question: Should we care about the variability of thevariable selection step in our post-selection inference?
8 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M2p submodelslinear, quadratic, cubic . . .
Perform Model Selection (see Abbas class notes)
Do inference based on chosen model
PredictionConfidence Intervals
Today’s Question: Should we care about the variability of thevariable selection step in our post-selection inference?
8 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M2p submodelslinear, quadratic, cubic . . .
Perform Model Selection (see Abbas class notes)
Do inference based on chosen model
PredictionConfidence Intervals
Today’s Question: Should we care about the variability of thevariable selection step in our post-selection inference?
8 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
An Example
n = 164 men took Cholestyramine (meant to reducecholesterol in the blood) for 7 years
x : a compliance measure adjusted : x ∼ N (0, 1)
y : cholesterol decrease
Perform a regression of y on x
We want to predict cholesterol decrease for a givencompliance value
µ = E [y |x ]
9 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
An Example
Multiple Linear Regression Model
Y = Xβ + ε, εi ∼ N (0, σ2)
6 candidate models: M = {linear , quadratic, . . . , sextic , } e.g.
y = β0 + β1x + β2x2 + . . .+ β6x6 + ε
Cp Criterion for Model Selection
Cp(M) =SSres(M)
n︸ ︷︷ ︸goodness of fit
+2σ2pM
n︸ ︷︷ ︸complexity
Use OLS estimate for β from chosen model and predict:
µ = Xβ
10 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
An Example: Nonparametric Bootstrap Analysis
Bootstrap the data:
data∗ = {(xj , yj)∗, j = 1, . . . , n}
where (xj , yj)∗ are drawn randomly with replacement from the
original data
data∗ →Cp
M∗ →OLS
β∗M∗ → µ∗ = XM∗ β∗M∗
Repeat B = 4000 times
11 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Reproduced from Efron 201312 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Reproduced from Efron 2013
13 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Prostate Data
Examine relation between level of PSA and clinical measures
n = 97 men who were about to receive prostatectomy
x = (x1, . . . , x8): clinical measures (adjusted : x ∼ N (0, 1))
y = logPSA
Perform regression of y on x
8 candidate models were identified using regsubsets andnbest=1
We want to estimate
µj = E [y |xj ] , j = 1, . . . , 97
14 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
original estimate = 3.6 based on Cp chosen model
0
100
200
2 3 4
fitted value µ95
coun
tFitted values for subject 95, from B=4000 nonparametric bootsrap replications
of the Cp chosen model; 60% of the replications greater than the original estimate 3.6
15 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
original estimate = 3.6
based on Cp chosen model
18%
22%
24%
0
30
60
90
120
3 4
fitted value µ95
coun
t
model
m3
m5
m7
Fitted values for subject 95, from B=4000 nonparametric bootsrap replications separated by three most frequently chosen models by Cp
16 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
1% 18% 12% 22% 15% 24% 8%
Model 7
******
3
4
m2 m3 m4 m5 m6 m7 m8model
fitted
value
µ 95
Boxplot of fitted values for Subject 95 for the model chosen by Cp criteria based on B=4000 nonparametric bootsrap samples
17 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Questions
Are you convinced there is a problem in the way we dopost-selection inference?
Is the juice worth the squeeze ?
18 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Questions
Are you convinced there is a problem in the way we dopost-selection inference?
Is the juice worth the squeeze ?
18 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Bagging (Breiman 1996)
Replace original estimator µ = t(y) with bootstrap average
µ = s(y) =1
B
B∑i=1
t(y∗i )
y∗i : i th bootstrap sample
Known as model averaging
“If perturbing the learning set can cause significant changes inthe predictor constructed, then bagging can improve accuracy”
19 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Bagging (Breiman 1996)
Replace original estimator µ = t(y) with bootstrap average
µ = s(y) =1
B
B∑i=1
t(y∗i )
y∗i : i th bootstrap sample
Known as model averaging
“If perturbing the learning set can cause significant changes inthe predictor constructed, then bagging can improve accuracy”
19 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Main Contribution of this Paper
t∗i = t(y∗i ), i = 1, . . . ,B (value of statistic in boot sample i)
Y ∗ij =# of times j th data point appears in i th boot sample
covj = cov(Y ∗ij , t∗i )
The non-parametric estimate of standard deviation for the ideal
smoothed bootstrap statistic µ = s(y) = B−1B∑i=1
t(y∗i ) is
sd =
n∑j=1
cov2j
1/2
20 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Main Contribution of this Paper
t∗i = t(y∗i ), i = 1, . . . ,B (value of statistic in boot sample i)
Y ∗ij =# of times j th data point appears in i th boot sample
covj = cov(Y ∗ij , t∗i )
The non-parametric estimate of standard deviation for the ideal
smoothed bootstrap statistic µ = s(y) = B−1B∑i=1
t(y∗i ) is
sd =
n∑j=1
cov2j
1/2
20 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Main Contribution of this Paper
t∗i = t(y∗i ), i = 1, . . . ,B (value of statistic in boot sample i)
Y ∗ij =# of times j th data point appears in i th boot sample
covj = cov(Y ∗ij , t∗i )
The non-parametric estimate of standard deviation for the ideal
smoothed bootstrap statistic µ = s(y) = B−1B∑i=1
t(y∗i ) is
sd =
n∑j=1
cov2j
1/2
20 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Main Contribution of this Paper
t∗i = t(y∗i ), i = 1, . . . ,B (value of statistic in boot sample i)
Y ∗ij =# of times j th data point appears in i th boot sample
covj = cov(Y ∗ij , t∗i )
The non-parametric estimate of standard deviation for the ideal
smoothed bootstrap statistic µ = s(y) = B−1B∑i=1
t(y∗i ) is
sd =
n∑j=1
cov2j
1/2
20 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Main Contribution of this Paper
t∗i = t(y∗i ), i = 1, . . . ,B (value of statistic in boot sample i)
Y ∗ij =# of times j th data point appears in i th boot sample
covj = cov(Y ∗ij , t∗i )
The non-parametric estimate of standard deviation for the ideal
smoothed bootstrap statistic µ = s(y) = B−1B∑i=1
t(y∗i ) is
sd =
n∑j=1
cov2j
1/2
20 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Main Contribution of this Paper
Note that covj = cov(Y ∗ij , t∗i ) is an unknown quantity. Therefore
we must estimate it. The estimate of standard deviation forµ = s(y) in the non-ideal case is
sdB =
n∑j=1
cov2j
1/2
cov j = B−1B∑i=1
(Y ∗ij − Y ∗·j
)(t∗i − t∗· )
Y ∗·j = B−1B∑i=1
Y ∗ij t∗· = B−1B∑i=1
t∗i
21 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Improvement on Traditional Standard Error
sdB =
n∑j=1
cov2j
1/2
is always less than the bootstrap estimate of standard deviation forthe unsmoothed statistic
sdB =
n∑j=1
(t∗i − t∗· )2
1/2
22 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Three Types
1 Standardµ± 1.96sdB
2 Percentile [µ∗(0.025), µ∗(0.975)
]
3 Smoothedµ± 1.96sdB
23 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Three Types
1 Standardµ± 1.96sdB
2 Percentile [µ∗(0.025), µ∗(0.975)
]
3 Smoothedµ± 1.96sdB
23 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Three Types
1 Standardµ± 1.96sdB
2 Percentile [µ∗(0.025), µ∗(0.975)
]
3 Smoothedµ± 1.96sdB
23 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
L1-Norm Penalty Functions
Recall the optimization problem of interest:
maxβ
`n(β)− n
p∑j=1
p(|βj |;λ)
24 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
LASSO, SCAD and MCP penalties
LASSO (Tibshirani, 1996)
p(|β|;λ) = λ|β|
SCAD (Fan and Li, 2001 )
p′(|β|;λ, γ) = λsign(β)
{I(|β|≤λ) +
(γλ− |β|)+(γ − 1)λ
I(|β|>λ)
}, γ > 2
MCP (Zhang, 2010)
p(|β|;λ, γ) =
{λ|β| − |β|
2
2γ |β| ≤ γλγλ2
2 |β| > γλ
25 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
Software
Analysis was performed in R
Implement LASSO using the glmnet package (Friedman,Hastie, Tibshirani, 2013)
SCAD and MCP using the coordinate descent algorithm(Breheny and Huang, 2011) in the ncvreg package
BIC and Cp model selection using the leaps package(Lumley, 2009)
26 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
MCP SCAD LASSO
0
50
100
150
200
2 3 4 2 3 4 2 3 4
fitted value µ95
coun
tFitted values for subject 95, from B=4000 nonparametric bootsrap replications
27 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
BIC Cp
0
100
200
300
−10 0 10 −10 0 10
fitted value µ95
coun
tFitted values for subject 95, from B=4000 nonparametric bootsrap replications
28 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
SCAD, MCP, LASSO
LASSO
SCAD
MCP
0.0 0.5 1.0length
pena
lty
type
standard
quantile
smooth
95% Confidence Intervals for fitted value of Subject 95 based on B=4000 nonparametric bootsrap samples for MCP, SCAD and LASSO penalties
29 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
BIC and Cp
BIC
Cp
0 5 10 15 20length
pena
lty
type
standard
quantile
smooth
Length of 95% Confidence Intervals for fitted value of Subject 95 based on B=4000 nonparametric bootsrap samples for Cp and BIC
30 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
Table : Prostate data, B=4000, Observation 95
model type fitted value sd length coverage
LASSO standard 3.62 0.31 1.21 0.94quantile 1.20 0.95smooth 3.57 0.29 1.14 0.93
SCAD standard 3.60 0.35 1.37 0.95quantile 1.33 0.95smooth 3.62 0.33 1.28 0.93
MCP standard 3.60 0.35 1.38 0.96quantile 1.35 0.95smooth 3.61 0.33 1.29 0.94
BIC standard 5.50 4.75 18.62 0.84quantile 16.05 0.95smooth 3.22 3.46 13.55 0.83
Cp standard 5.13 5.11 20.02 0.86quantile 16.15 0.95smooth 0.64 4.40 17.24 0.97 31 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
An Example: Parametric Bootstrap Analysis
Obtain OLS estimates µOLS based on full model
Generatey∗ ∼ N (µOLS , I)
Full Model Bootstrap
y∗ →Cp
M∗, β∗M∗ → µ∗ = XM∗ β∗M∗
Repeat B = 4000 times → t∗ij = µ∗ijSmoothed Estimates
sj = B−1B∑i=1
t∗ij
32 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
original estimate = 3.6
based on Cp chosen model
0
50
100
150
200
1 2 3 4 5
fitted value µ95
coun
tFitted values for subject 95, from B=4000 Parametric bootsrap replications
of the Cp chosen model; 53% of the replications greater than the original estimate 3.6
33 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
original estimate = 3.6
based on Cp chosen model
0
10
20
30
1 2 3 4 5
fitted value µ95
coun
t
model
m6
m7
m8
Fitted values for subject 95, from B=4000 Parametric bootsrap replications separated by three most frequently chosen models by Cp
34 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●3% 6% 16% 12% 13% 14% 17% 19%
Model 8
******
1
2
3
4
5
m1 m2 m3 m4 m5 m6 m7 m8model
fitted
value
µ 95
Boxplot of fitted values for Subject 95 for the model chosen by Cp criteria based on B=4000 Parametric bootsrap samples
35 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
original estimate = 3.7
based on BIC chosen model
0
50
100
150
200
2 3 4 5
fitted value µ95
coun
tFitted values for subject 95, from B=4000 Parametric bootsrap replications
of the BIC chosen model; 40% of the replications greater than the original estimate 3.7
36 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
original estimate = 3.7
based on BIC chosen model
27%
20%
18%
0
20
40
60
80
3 4 5
fitted value µ95
coun
t
model
m1
m2
m3
Fitted values for subject 95, from B=4000 Parametric bootsrap replications separated by three most frequently chosen models by BIC
37 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
20% 18% 27% 13% 9% 5% 5% 3%
Model 3
******
2
3
4
5
m1 m2 m3 m4 m5 m6 m7 m8model
fitted
value
µ 95
Boxplot of fitted values for Subject 95 for the model chosen by BIC criteria based on B=4000 Parametric bootsrap samples
38 / 43
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
Improvements for regularized procedures where tuningparameters are also chosen in a data-driven fashion
GLM ?
Why parametric bootstrap?
39 / 43
Family
Roots
What I have done so far
1 BSc Actuarial Math - Concordia (2005-2008)
2 Pension actuary (2008-2011)
3 RA at the Chest with Andrea Benedetti (2011-2012)
4 MSc Biostats - Queen’s (2012-2013)
What’s Next?
1 PhD Biostatistics - McGill (2013-???)
2 Supervisor Celia Greenwood (Statistical Genetics)