Model Selection Using AIC and BIC

1

Common Model Selection

Statistics: AIC and BIC

Addictions Research Seminar

August 8, 2007

2@ David FarleyA visit from a flying mutant ______.

© D

avid

Fa

rle

y

3

Elephant Model

4

Objectives

• Know what AIC and BIC do

• Know the role that some statisticians

think AIC and BIC should play

in research

• Be aware of alternatives

• Motivation to look further

5

Outline

• Objectives

• Model Selection (MS) Problems

• Commonly Used MS Statistics

– Motivations

– Use

• Alternatives and Recommendations

6

Model Selection

• What you do depends on:

– Study Design

– Suite of Collected Variables

– Purpose

– Philosophy on model building

7

Model Selection

• Model selection is not model testing

• Psychological model/theory vs.

statistical model

8

Research Context

• Null Hypothesis Significance Testing

• Model Testing– Testing Structure

– Parameter Testing

• Exploratory/Model Building– Descriptive motivations

– Predictive utility

– Evidence production

9

Model Selection: Approaches

1. Just select the full model only

2. Use stepwise selection, ignore selection uncertainty

3. Use MS statistic, ignore selection uncertainty

4. Use MS statistic & consider uncertainty

5. Do multimodel inference.

6. First reduce predictors and thoughtfully weigh models considering MS statistics

10

Although [MS Stats] are helpful exploratory tools, the model-building process should utilize theory and common sense.

Alan Agresti

Model selection is rarely based solely on [MS Stats] but depends also on the purpose of the analysis and subject matter information.

Jouni Kuha

11

Model Selection Criteria

• Test of hypotheses (NHST)

• Ad hoc methods

• Optimization of some selection criteria

– Criteria based on MSE, MS prediction error

– Information Criteria

– Consistent estimators of P(true model)

12

NHST does not mesh with IC in model

selection

“A very common mistake seen in the applied

literature is to “test” to see whether the best model

is significantly better than the second best model”Anderson & Burnham 2002

13

Using Statistics to Help Guide

Model Fit (MF)

• R2

2gof

• MSE

Model Selection (MS)

• AIC

• BIC

• TIC, NIC, EIC, FIC, GIC, SIC,QAIC, Cp, PRESS, CAICF, MDL, HQ, Vapnik-Chernovekis D…

14

15

16

AIC Motivation

• A measure of the predictive

performance of the models

• It is based on information loss

17

AIC Motivation

• Based on Kullback-Leibler (K-L) information

loss

• I(f,g) is the information loss due to the use of

a model to approximate reality

• Turns out that you can compare models’

relative information loss without knowing

being able to describe reality exactly

18

AIC Motivation

Akaike found that the log-likelihood value of a model was a biased estimate of the relative information loss. The bias was approximately equal to the number of parameters in the model.

relativeE(K - L) = ℓ(q | D) - P

19

AIC Functionality

• AIC selects a best model in terms of the

bias/variance trade-off, not a quasi-true

model

• The target model changes with the

sample size.

20

• AIC is not consistent. There is always a

possibility it will select models with too

many variables (without finite sample

adjustments).

• AIC is efficient. The expected prediction

error for AIC selected models is the

smallest possible (N being large).

21

What AIC Values Mean

AIC values are not interpretable. They

contain arbitrary constants.

Di = AICi - AICmin

4 £ Di £ 7

Di >10

Considerable Less Support

Difficult to Support

22

When all the models have very low weights,

there is no inferential credibility for any single

model regarding what are the “important”

predictor variables. It is foolish to think that

the variables included in the best model are

“the” important ones and the excluded are not

important.”

Burnham & Anderson 2002

23

AIC = 2[ℓ(q2 ) - ℓ(q1)]- 2(p2 - p1)

BIC = 2[ℓ(q2 ) - ℓ(q1)]- logn(p2 - p1)

24

BIC Motivation

“Aim of Bayesian approach is to identify the model with the highest probability of being the true model”

Kuha 2004

“The assumed purpose of the BIC-selected model was often simple prediction: as opposed to scientific understanding of the system under study”

Burnham & Anderson 2002

25

BIC Motivation

BF12 = p(D |M2 ) / p(D |M1)

Bayes Factor = evidence in favor

of model 2 over model 1

BIC is an approximation of a transformation of

the Bayes Factor (for a limited set of priors).

26

BIC does not always need to be a good

approximation of the Bayes Factor if it is

used mainly to identify which of the of

the models has the highest posterior

probability.

27

Justification for BIC

BIC is consistent. It asymptotically

reaches its goal.

28

Meaning of BIC Values

pi is the posterior probability that model i

is the true model.

(assuming that that there is a true model

and that it is in your model set)

pi =e(- 1

2DBICi )

r=1

R

å e(- 12DBICR )

29

PDA Model

Time MS Stats

LinearQuadratic Cubic HDRS Tx PDA1Attendance AIC dAIC BIC P(T.M.)

1 x x x x x -174 2 -112 0.002428

2 x x x x x x -141 35 -70 1.84E-12

3 x x x x x x -166 10 -100 6.02E-06

4 x x x -151 25 -99 3.65E-06

5 x x x x -176 0 -124 0.979624

6 x x x -158 18 -116 0.017942

30

Similarities• Penalized model selection

criteria

• Data must be fixed

• They can be special cases of each other

• Both good at approximating target quantities

• Bayesian or frequentist derivation

• Ambivalence

• Only as good as your data

Differences• BIC is dimension

consistent/AIC approximate

relative information loss

• BIC penalizes complex

models more than AIC

• Definition of “good model”

• Need for a true model

31

Burnham and Anderson

Objection to BICWe question the concept of a simple “true

model” in the biological sciences and would

surely think if it existed that it would not be in

the set of candidate models.

There is nothing in the foundation of of BIC that

addresses a bias-variance trade off, and

hence addresses parsimony as a feature of

BIC model selection.

32

Other’s Views

For model selection purposes, there is no clear

choice between AIC and BIC.

Kuha 2002

BIC target model doesn’t depend on N, but we

know the number of parameters selected will

so BIC can’t deliver on its objective in practice.

KMC

33

“All models are wrong, some models are useful.”

George Box

Any model is just a simplification of reality.

Select a model that is a useful description or powerful predictor.

34

Simulation Results

• BIC better than AIC when the true

model is included as a candidate and

often better than AICc

• AIC does better when true model was

not in the set

• These are not universal results

35

Simulation ResultsR

ela

tiv

e E

rro

r

Kuha 2004

36

Alternative Approaches Exist

• Direct

• Cross-validation

• Use all the models at the same time!

• Report out top contenders

Train Validate Test

37

Recommendations

• Establish a philosophy

• Conduct thoughtful model building

• Use MS stats as a guide only

• Use multiple stats simultaneously

38

Elephant Model

39

Objectives

• Know what AIC and BIC do.

• Know the role that some statisticians

think AIC and BIC should play

in research.

• Be aware of alternatives.

• Motivate to learn more about AIC/BIC

40

© D

avid

Fa

rle

y

41

Restricted Space and Directed

Selection• Akaike believed that the most important

contribution of his general approach was the

clarification of the importance of modeling

and the need for substantial, prior information

on the system being studied.

• The importance of carefully defining a small

set of candidate models cannot be

overemphasized.(A & B 2002)

Data & Analytics

Model Selection Using AIC and BIC