DAIDD Gainesville, Florida December 2013 Jim Scott, Ph.D, M.A., M.P.H

Preview:

Citation preview

DAIDDGainesville, FloridaDecember 2013

Jim Scott, Ph.D, M.A., M.P.H.

By the end of this talk, I hope you’ll:

Have a good sense of what model evaluation is, why it’s important, and how it’s tied to your research question

Know some of the characteristics that are desirable in models

2

Specific question

Identify relevant factors and information

Model formulation

Mathematics

Evaluation

3

“E” topics: Edison, Thomas Epicycle Epidemiology Elasticity Eratosthenes Euler Existence Evolution Extrapolation Eradication

“M” topics: Malthus, Thomas Mars Maternity Maximum likelihood Maxwell, James C. Misery Monte Carlo Moon Mortality Mumps

4

Let’s look at a few different types of models

5

Source: Wikipedia

6Source: Wikipedia

7Source: Wikipedia

8

Source: XKCD

9

Thomas Jefferson

John Adams

John Adams

Thomas Jefferson

Thomas Jefferson

James Madison

James Madison

DeWitt Clinton

James Monroe

John Quincy Adams

Andrew Jackson

Andrew Jackson

John Quincy Adams

Henry Clay

Andrew Jackson

Martin Van Buren

William Henry Harrison

William Henry Harrison

Martin Van Buren

James K. PolkHenry Clay

Lewis Cass

Zachary Taylor

Winfield Scott

Franklin Pierce

John C. Fremont

James Buchanan

Stephen A. Douglas

Abraham Lincoln

Abraham Lincoln

George B. McClellan

Uylsses S. Grant

Ulysses S. Grant

Horace Greeley

Rutherford B. Hayes

Samuel J. Tilden

James A. GarfieldWinfield HancockGrover ClevelandJames G. BlaineGrover ClevelandBenjamin HarrisonGrover Cleveland

Benjamin Harrison

William McKinley

William J. Bryan

William McKinley

William J. Bryan

Theodore Roosevelt

Alton B. Parker

William Howard Taft

William J. BryanWoodrow Wilson

Theodore Roosevelt

Woodrow Wilson

Charles E. Hughes

James Cocks

Warren G. Harding

Calvin Coolidge

John W. Davis

Herbert Hoover

Al Smith

Franklin D. Roosevelt

Herbert Hoover

Alf Landon

Franklin D. Roosevelt

Franklin D. Roosevelt

Wendell Willkie

Franklin D. Roosevelt

Thomas E. Dewey

Harry S. Truman

Thomas E. Dewey

Dwight D. Eisenhower

Adlai Stevenson

Dwight D. Eisenhower

Adlai Stevenson

John F. KennedyRichard Nixon

Lyndon Johnson

Barry Goldwater

Richard NixonHubert Humphrey

Richard Nixon

George McGovern

Gerald FordJimmy Carter

Jimmy Carter

Ronald Reagan

Ronald Reagan

Walter Mondale

George H. W. Bush

Michael Dukakis

Bill Clinton

George H. W. Bush

Bill Clinton

Bob Dole

George W. BushAl GoreGeorge W. Bush

John Kerry

Barack Obama

John McCain

30

40

50

60

70

160 170 180 190 200heightcm

Influence of Candidate Height on Presidential Elections

Estimated Percent Popular Vote = 1.07 + 0.26 * Height

10

Source: Witlock & Schluter, Analysis of Biological Data

11

The Krebs Cycle

The Eye

= birth rate

N = S + I

= infection rate

I = Weibull mortality

S I I

N SI /N IS

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30Time (years)

P(s

urv

ivin

g)

Normal (Weibull 2)

Exponential(Weibull 1)

Slide credit: J. Hargrove/B. Williams

Consider the previous examples

Talk with someone next to you

Come up with a list of characteristics that a “good” model should have

For example, you might say “simplicity”

13

Accurate (i.e. low bias)

14

A model is accurate if estimates based on the model match the truth E.g. models that are used to predict the

weather are reasonably accurate when predicting tomorrow’s weather. They are much less accurate at predicting the weather at times further into the future.

Does the model fit the data

15

16Source: Wikipedia

17

Farr initially believed in the miasma theory of disease transmission – disease was propagated by “bad air”

The higher the elevation, the better the air

Mortality at terrace level X = Mortality at terrace 1 / terrace level

X18

19

20r = 0.9972 Est. Temp = 155.3 + 1.90 * Pressure

195

200

205

210

215

Tem

p -

F

20 22 24 26 28 30Pressure - Inches of Hg

Boiling Point of Water vs Pressure

Source: Wikipedia

Accurate (i.e. low bias)

Descriptively realistic

21

A model is descriptively realistic if it’s derived from a correct description of the mechanism involved in whatever is being modeled Corollary: underlying assumptions are correct Statistical models are not descriptively realistic

▪ For example, a linear equation only models a pattern in the data – there’s nothing telling us what’s going on behind the scene

An SIR model is more descriptively realistic▪ A mechanism for transmission is specified

22

23

33

.13

.23

.33

.4

.0047 .0048 .0049 .005 .0051 .0052InverseT

Source: Wikipedia

Ln P

Accurate (i.e. low bias)

Descriptively realistic

Precise (i.e. low variability)

24

A model is precise if the estimates that the model produces have low variability E.g. a model that estimates that it will

start to rain in the next 3 – 6 hours is more precise than a model that estimates it will start to rain in the next 3 – 6 days

25

26Weekly U.S. Influenza Surveillance Report, http://www.cdc.gov/flu/weekly/

27

I(a,t) is the incidence at age a at time t

P(a,t) is the age-specific prevalence at age a at time t

*

* Simplified model

28

Accurate (i.e. low bias)

Descriptively realistic

Precise (i.e. low variability)

General

29

A model is general if it applies to a wide variety of situations e.g. the law of supply and demand

30Source: Wikipedia

31

N

SI

dt

dS I

N

SI

dt

dI I

dt

dR

S I R

33

A

BX

X

ZX

A

Z

Each pie represents a sufficient cause for disease (i.e. disease is inevitable)

Each letter represents a component cause for a disease

The component cause X is a necessary cause (i.e. disease cannot occur without it)

Accurate (i.e. low bias)

Descriptively realistic

Precise (i.e. low variability)

General

Robust

34

A model is robust if it is relatively immune to errors in the data and/or immune to small violations of model assumptions Is the model very sensitive to relatively small

changes in estimated input parameters? ▪ Model is NOT robust

Do model predictions remain accurate even when some key assumptions do not strictly hold?▪ Model IS robust

35

36

Poor Hygiene

Poo

r S

anit

atio

n

% d

isea

se a

ttri

buta

ble

to w

ater

No shedding of pathogens (contamination) into the water ( = 0)

Poor Hygiene

Poo

r S

anit

atio

n

% d

isea

se a

ttri

buta

ble

to w

ater

Moderate contamination ( = 1.0)

Poor Hygiene

Poo

r S

anit

atio

n

% d

isea

se a

ttri

buta

ble

to w

ater

Very high contamination ( = 2.0)

Accurate (i.e. low bias)

Descriptively realistic

Precise (i.e. low variability)

General

Robust

Simple / Parsimonious

40

A model is parsimonious if it can “accomplish a lot without much” E.g. a model that selects a relatively small

number of the most useful parameters

Simple isn’t always better

The research question should drive the complexity of the model

41

42r = 0.9972 Est. Temp = 155.3 + 1.90 * Pressure

195

200

205

210

215

Tem

p -

F

20 22 24 26 28 30Pressure - Inches of Hg

Boiling Point of Water vs Pressure

Source: Wikipedia

43

-1.5

-1-.

50

.5R

esid

ual

s

195 200 205 210 215Fitted values

Hmmm…

What does this mean?

Outlier!

Strong evidenceof non-linearity

44

. regress bpt pressure pressure2

Source | SS df MS Number of obs = 16-------------+------------------------------ F( 2, 13) =18942.88 Model | 527.718935 2 263.859468 Prob > F = 0.0000 Residual | .181079777 13 .013929214 R-squared = 0.9997-------------+------------------------------ Adj R-squared = 0.9996 Total | 527.900015 15 35.1933344 Root MSE = .11802

------------------------------------------------------------------------------ bpt | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- pressure | 3.754805 .2026619 18.53 0.000 3.31698 4.192629 pressure2 | -.0358576 .0039463 -9.09 0.000 -.044383 -.0273322 _cons | 131.7824 2.570509 51.27 0.000 126.2292 137.3357------------------------------------------------------------------------------

195

200

205

210

215

20 22 24 26 28 30Pressure

Est. Temp = 131.8 + 3.75 Pressure – 0.036 Pressure2

45

IN

SI

dt

dI vs.

46

Used as a model selection tool Penalizes models with excessive parameter

spaces AIC = 2k – 2ln(L) AICc = AIC + 2k(k+1) / (n – k – 1)

AICc is often used to avoid over-fitting when the sample size is small or the parameter space is large

Lower AICc more parsimonious model

47

“Best” model (#1)

“Mass action”Model (#5)

IN

SI

dt

dI

Accurate (i.e. low bias)

Descriptively realistic

Precise (i.e. low variability)

General

Robust

Simple / Parsimonious

Useful

48

A model is useful if: its conclusions are useful it points the way to other good models

E.g. Modeling HIV exercise▪ The early models weren’t necessarily

accurate but they were useful

49

= birth rate

N = S + I

= infection rate

I = Weibull mortality

S I I

N SI /N IS

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30Time (years)

P(s

urv

ivin

g)

Normal (Weibull 2)

Exponential(Weibull 1)

Slide credit: J. Hargrove/B. Williams

0.0

0.2

0.4

0.6

0.8

1.0

1980 1990 2000 2010 2020Year

0.00

0.05

0.10

0.15

0.20

Pre

vale

nce

Inci

denc

e/m

orta

lity

Slide credit: J. Hargrove/B. Williams

0.0

0.2

0.4

0.6

0.8

1.0

1985 1990 1995 2000Year

Re

lativ

e t

ran

smis

sio

n

.

~

S I

I N SI /N I

S ~

= birth rate

N = population = C(t)

I = mortality

~

~

C(t)

Including controlSlide credit: J. Hargrove/B. Williams

0.0

0.1

0.2

0.3

0.4

1980 1990 2000 2010 2020Year

0.00

0.02

0.04

0.06

Pre

vale

nce

Inci

denc

e/m

orta

lity

Slide credit: J. Hargrove/B. Williams

Accurate (i.e. low bias)

Descriptively realistic

Precise (i.e. low variability)

General

Robust

Simple / Parsimonious

Useful

Inexpensive

Others???54

Accurate (i.e. low bias)

Descriptively realistic

Precise (i.e. low variability)

General

Robust

Simple / Parsimonious

Useful

Inexpensive

Others???55

It really depends on what your original research question was Was the goal to accurately predict

something? Was the goal to determine a relationship

between two or more parameters? Was the goal to understand a system in

general terms? Was the goal to test a hypothesis? Or to

generate one?56

Consider each of the models presented today What are the good

things about each model?

What are the shortcomings of each model?

Final word:

57Source: XKCD

Concepts of Mathematical Modeling, Walter Meyer, McGraw-Hill, 1984

Probability and Statistics, Charles Stone, Duxbury, 1996 Modeling Infectious Diseases in Humans and Animals, Keeling and

Rohoni, Princeton, 2008 Mathematical Models for Communicable Diseases, Brauer and

Castillo-Chavez, SIAM, 2013 The Analysis of Biological Data, Whitlock and Schluter, Roberts

and Company, 2008 Wikipedia XKCD

58

Recommended