Upload
kevin-cummins
View
476
Download
6
Embed Size (px)
Citation preview
1
Common Model Selection
Statistics: AIC and BIC
Addictions Research Seminar
August 8, 2007
2@ David FarleyA visit from a flying mutant ______.
© D
avid
Fa
rle
y
3
Elephant Model
4
Objectives
• Know what AIC and BIC do
• Know the role that some statisticians
think AIC and BIC should play
in research
• Be aware of alternatives
• Motivation to look further
5
Outline
• Objectives
• Model Selection (MS) Problems
• Commonly Used MS Statistics
– Motivations
– Use
• Alternatives and Recommendations
6
Model Selection
• What you do depends on:
– Study Design
– Suite of Collected Variables
– Purpose
– Philosophy on model building
7
Model Selection
• Model selection is not model testing
• Psychological model/theory vs.
statistical model
8
Research Context
• Null Hypothesis Significance Testing
• Model Testing– Testing Structure
– Parameter Testing
• Exploratory/Model Building– Descriptive motivations
– Predictive utility
– Evidence production
9
Model Selection: Approaches
1. Just select the full model only
2. Use stepwise selection, ignore selection uncertainty
3. Use MS statistic, ignore selection uncertainty
4. Use MS statistic & consider uncertainty
5. Do multimodel inference.
6. First reduce predictors and thoughtfully weigh models considering MS statistics
10
Although [MS Stats] are helpful exploratory tools, the model-building process should utilize theory and common sense.
Alan Agresti
Model selection is rarely based solely on [MS Stats] but depends also on the purpose of the analysis and subject matter information.
Jouni Kuha
11
Model Selection Criteria
• Test of hypotheses (NHST)
• Ad hoc methods
• Optimization of some selection criteria
– Criteria based on MSE, MS prediction error
– Information Criteria
– Consistent estimators of P(true model)
12
NHST does not mesh with IC in model
selection
“A very common mistake seen in the applied
literature is to “test” to see whether the best model
is significantly better than the second best model”Anderson & Burnham 2002
13
Using Statistics to Help Guide
Model Fit (MF)
• R2
2gof
• MSE
Model Selection (MS)
• AIC
• BIC
• TIC, NIC, EIC, FIC, GIC, SIC,QAIC, Cp, PRESS, CAICF, MDL, HQ, Vapnik-Chernovekis D…
14
15
16
AIC Motivation
• A measure of the predictive
performance of the models
• It is based on information loss
17
AIC Motivation
• Based on Kullback-Leibler (K-L) information
loss
• I(f,g) is the information loss due to the use of
a model to approximate reality
• Turns out that you can compare models’
relative information loss without knowing
being able to describe reality exactly
18
AIC Motivation
Akaike found that the log-likelihood value of a model was a biased estimate of the relative information loss. The bias was approximately equal to the number of parameters in the model.
relativeE(K - L) = ℓ(q | D) - P
19
AIC Functionality
• AIC selects a best model in terms of the
bias/variance trade-off, not a quasi-true
model
• The target model changes with the
sample size.
20
• AIC is not consistent. There is always a
possibility it will select models with too
many variables (without finite sample
adjustments).
• AIC is efficient. The expected prediction
error for AIC selected models is the
smallest possible (N being large).
21
What AIC Values Mean
AIC values are not interpretable. They
contain arbitrary constants.
Di = AICi - AICmin
4 £ Di £ 7
Di >10
Considerable Less Support
Difficult to Support
22
When all the models have very low weights,
there is no inferential credibility for any single
model regarding what are the “important”
predictor variables. It is foolish to think that
the variables included in the best model are
“the” important ones and the excluded are not
important.”
Burnham & Anderson 2002
23
AIC = 2[ℓ(q2 ) - ℓ(q1)]- 2(p2 - p1)
BIC = 2[ℓ(q2 ) - ℓ(q1)]- logn(p2 - p1)
24
BIC Motivation
“Aim of Bayesian approach is to identify the model with the highest probability of being the true model”
Kuha 2004
“The assumed purpose of the BIC-selected model was often simple prediction: as opposed to scientific understanding of the system under study”
Burnham & Anderson 2002
25
BIC Motivation
BF12 = p(D |M2 ) / p(D |M1)
Bayes Factor = evidence in favor
of model 2 over model 1
BIC is an approximation of a transformation of
the Bayes Factor (for a limited set of priors).
26
BIC does not always need to be a good
approximation of the Bayes Factor if it is
used mainly to identify which of the of
the models has the highest posterior
probability.
27
Justification for BIC
BIC is consistent. It asymptotically
reaches its goal.
28
Meaning of BIC Values
pi is the posterior probability that model i
is the true model.
(assuming that that there is a true model
and that it is in your model set)
pi =e(- 1
2DBICi )
r=1
R
å e(- 12DBICR )
29
PDA Model
Time MS Stats
LinearQuadratic Cubic HDRS Tx PDA1Attendance AIC dAIC BIC P(T.M.)
1 x x x x x -174 2 -112 0.002428
2 x x x x x x -141 35 -70 1.84E-12
3 x x x x x x -166 10 -100 6.02E-06
4 x x x -151 25 -99 3.65E-06
5 x x x x -176 0 -124 0.979624
6 x x x -158 18 -116 0.017942
30
Similarities• Penalized model selection
criteria
• Data must be fixed
• They can be special cases of each other
• Both good at approximating target quantities
• Bayesian or frequentist derivation
• Ambivalence
• Only as good as your data
Differences• BIC is dimension
consistent/AIC approximate
relative information loss
• BIC penalizes complex
models more than AIC
• Definition of “good model”
• Need for a true model
31
Burnham and Anderson
Objection to BICWe question the concept of a simple “true
model” in the biological sciences and would
surely think if it existed that it would not be in
the set of candidate models.
There is nothing in the foundation of of BIC that
addresses a bias-variance trade off, and
hence addresses parsimony as a feature of
BIC model selection.
32
Other’s Views
For model selection purposes, there is no clear
choice between AIC and BIC.
Kuha 2002
BIC target model doesn’t depend on N, but we
know the number of parameters selected will
so BIC can’t deliver on its objective in practice.
KMC
33
“All models are wrong, some models are useful.”
George Box
Any model is just a simplification of reality.
Select a model that is a useful description or powerful predictor.
34
Simulation Results
• BIC better than AIC when the true
model is included as a candidate and
often better than AICc
• AIC does better when true model was
not in the set
• These are not universal results
35
Simulation ResultsR
ela
tiv
e E
rro
r
Kuha 2004
36
Alternative Approaches Exist
• Direct
• Cross-validation
• Use all the models at the same time!
• Report out top contenders
Train Validate Test
37
Recommendations
• Establish a philosophy
• Conduct thoughtful model building
• Use MS stats as a guide only
• Use multiple stats simultaneously
38
Elephant Model
39
Objectives
• Know what AIC and BIC do.
• Know the role that some statisticians
think AIC and BIC should play
in research.
• Be aware of alternatives.
• Motivate to learn more about AIC/BIC
40
© D
avid
Fa
rle
y
41
Restricted Space and Directed
Selection• Akaike believed that the most important
contribution of his general approach was the
clarification of the importance of modeling
and the need for substantial, prior information
on the system being studied.
• The importance of carefully defining a small
set of candidate models cannot be
overemphasized.(A & B 2002)