Testing Predictive Performance of Ecological Niche Models A. Townsend Peterson, STOLEN FROM Richard...

Preview:

Citation preview

Testing Predictive Performance of Ecological Niche Models

A. Townsend Peterson, STOLEN FROMRichard Pearson

Niche Model Validation• Diverse challenges …

– Not a single loss function or optimality criterion– Different uses demand different criteria– In particular, relative weights applied to omission and

commission errors in evaluating models

• Nakamura: “which way is relevant to adopt is not a mathematical question, but rather a question for the user”– Asymmetric loss functions

Where do I get testing data????

(after Araújo et al. 2005 Gl. Ch. Biol.)

Model calibration and evaluation strategies: resubstitution

100%

Same region

Different region

Different time

Different resolutionEvaluation

Calibration

Projection

All available

data

(after Araújo et al. 2005 Gl. Ch. Biol.)

Model calibration and evaluation strategies: independent validation

100%All

available data

Same region

Different region

Different time

Different resolutionEvaluation

Calibration

Projection

(after Araújo et al. 2005 Gl. Ch. Biol.)

Model calibration and evaluation strategies: data splitting

70%

Test data

Same region

Different region

Different time

Different resolution

Evaluation

Calibration

Projection

Calibration data

30%

Types of Error

The four types of results that are possible when testing a distribution model

(see Pearson NCEP module 2007)

Presence-absence confusion matrix

Predicted present

Predicted absent

Recorded present Recorded (or assumed) absent

a (true positive)

c (false negative)

b (false positive)

d (true negative)

Thresholding

Selecting a decision threshold (p/a data)

(Liu et al. 2005 Ecography 29:385-393)

Selecting a decision threshold (p/a data)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.2 0.4 0.6 0.8 1

Threshold

Kapp

a

Selecting a decision threshold (p/a data)

Omission(proportion of presences predicted absent)

(c/a+c)

Commission(proportion of absences predicted present)

(b/b+d)

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

threshold

omis

sion

rate

LPTT10

Selecting a decision threshold (p-o data)

Threshold-dependent Tests(= loss functions)

The four types of results that are possible when testing a distribution model

(see Pearson NCEP module 2007)

Presence-absence test statistics

Predicted present

Predicted absent

Recorded present Recorded (or assumed) absent

a (true positive)

c (false negative)

b (false positive)

d (true negative)

Proportion (%) correctly predicted (or ‘accuracy’, or ‘correct classification rate’):

(a + d)/(a + b + c + d)

Cohen’s Kappa:

)]/)))(())(((([)]/)))(())(((()[(

ndcdbbacanndcdbbacadak

Presence-absence test statistics

Predicted present

Predicted absent

Recorded present Recorded (or assumed) absent

a (true positive)

c (false negative)

b (false positive)

d (true negative)

Proportion of observed presences correctly predicted (or ‘sensitivity’, or ‘true positive fraction’):

a/(a + c)

Presence-only test statistics

Predicted present

Predicted absent

Recorded present Recorded (or assumed) absent

a (true positive)

c (false negative)

b (false positive)

d (true negative)

Proportion of observed presences correctly predicted (or ‘sensitivity’, or ‘true positive fraction’):

a/(a + c)

Proportion of observed presences incorrectly predicted (or ‘omission rate’, or ‘false negative fraction’):

c/(a + c)

Presence-only test statistics

Predicted present

Predicted absent

Recorded present Recorded (or assumed) absent

a (true positive)

c (false negative)

b (false positive)

d (true negative)

Presence-only test statistics:testing for statistical significance

U. sikorae

Leaf-tailed gecko (Uroplatus)

U. sikorae

Success rate: 4 from 7Proportion predicted present: 0.231Binomial p = 0.0546

Success rate: 6 from 7Proportion predicted present: 0.339Binomial p = 0.008

Proportion of observed (or assumed) absences correctly predicted (or ‘specificity’, or ‘true negative fraction’):

d/(b + d)

Absence-only test statistics

Predicted present

Predicted absent

Recorded present Recorded (or assumed) absent

a (true positive)

c (false negative)

b (false positive)

d (true negative)

Proportion of observed (or assumed) absences correctly predicted (or ‘specificity’, or ‘true negative fraction’):

d/(b + d)

Proportion of observed (or assumed) absences incorrectly predicted (or ‘commission rate’, or ‘false positive fraction’):

b/(b + d)

Absence-only test statistics

Predicted present

Predicted absent

Recorded present Recorded (or assumed) absent

a (true positive)

c (false negative)

b (false positive)

d (true negative)

AUC: a threshold-independent test statistic

Predicted presentPredicted absent

Recorded present Recorded (or assumed) absent

a (true positive)c (false negative)

b (false positive)d (true negative)

sensitivity = a/(a+c)

specificity = d/(b+d)

(1 – omission rate)

(fraction of absences predicted present)

1 - specificity0 1

0

1

sens

itivi

ty Predicted probability of occurrence

Predicted probability of occurrence

10

10Fr

eque

ncy

Freq

uenc

y

set of ‘absences’ set of ‘presences’

set of ‘absences’ set of ‘presences’

Threshold-independent assessment:The Receiver Operating Characteristic (ROC) Curve

A B

C

(check out: http://www.anaesthetist.com/mnm/stats/roc/Findex.htm)