34
Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: Understand and document uncertainties from data collection to publication

Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Embed Size (px)

Citation preview

Page 1: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Uncertainty• How “certain” of the data are we?• How much “error” does it contain?• How well does the model match reality?• Goal:

– Understand and document uncertainties from data collection to publication

Page 2: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

LifeMapper: Tamarix chinensis

LifeMapper.org

Page 3: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

LifeMapper: Loggerhead Turtles

LifeMapper.org

Page 4: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Sample Data

Predictor Layers

Modeling Software

• Spatial Precision• Spatial Accuracy • Sample Bias• Identification Errors• Date problems• Gross Errors• Gridding

• Over fitting?• Assumptions?

Response Curves Model Performance MeasuresNumber of ParametersAIC, AICc, BIC, AUC

• Match expectations?• Over-fit?

What is the best model?

Habitat Map

• Realistic?• Uncertainty maps?

• How to determine?

Settings

Road Map of Uncertainty

• Accurate measures?

• Noise• Correlation• Interpolation Error• Spatial Errors• Measurement Errors• Temporal Uncertainty

Page 5: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Occurrences of Polar Bears

From The Global Biodiversity Information Facility (www.gbif.org, 2011)

Page 6: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

January 1st Dates

• If you put just a “year”, like 2011, into a relational database, the database will return:– Midnight, January 1st, of that year

• In other words:– 2011 becomes:– 2011-01-01 00:00:00.00

Page 7: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Predictor Layers• Remotely sensed:

– DEMs, Visual, IR, NIR, SST, NPP, Sea Height

• Integrated from multiple data sets:– Bathymetry

• Interpolated:– Temp, precipitation, wind– DO, Sub-surface temp, salinity, bottom type

• Processed from other layers:– Slope, aspect, distance to shore

• From other models:– Climate

Page 8: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

BioClim/WorldClim Data

0.00

0.20

0.40

0.60

0.80

1.00

1.20-231

-219

-207

-195

-183

-172

-160

-148

-136

-124

-112

-100 -88

-77

-65

-53

-41

-29

-17 -5 7 19 30 42 54 66 78 90 102

Num

ber o

f Pix

els

Scal

ed to

1

Degrees C Times 10

Min Temp of Coldest Month

Page 9: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Red Snapper

• An important recreational and commercial species

• $7 - 70 million/year

outdooralabama.com

www.safmc.net

lickyourownbowl.wordpress.com

Page 10: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

SEAMAP Trawls (>47,000 records)

Red Snapper Occurrences (>6,000 records)

Page 11: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Density ofPlatforms and Pipelines

Created from Bureau of Ocean Energy Management (BOEM) Point Data Set

Uncertainty?

BathymetryResampled from 90m

NOAA Coastal Inundation DatasetAnd others

Uncertainty < 9km

Net Primary Production (NPP)Milligrams of Carbon per

Meter Squared per DayOSU Ocean Productivity

Uncertainty?

Sea Surface Temperature (SST)NOAA

AVHRR Pathfinder SatelliteSpatial: 9km

Measured: <0.4 K

Page 12: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Sample Data• Spatial Precision

– Standard Deviation of about 2 km

• Spatial Accuracy: – < 1km

• Sampling Bias: – Some areas more heavily sampled

• Identification Errors: – Unknown but red snapper are pretty easy to identify

• Date problems: – Not a temporal model

• Gridding: – Not at 9 kilometers

Page 13: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Jiggling The Samples

• Randomly shifting the position of the points based on a given standard deviation based on sample uncertainty

• Running the model repeatedly to see the potential effect of the uncertainty

Page 14: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Predictor Resolution

• Should model at the lowest resolution of the predictor layers or lower

• Lower resolution:– Can reduce problems from sample data

uncertainty– In same cases can combine sample data

into measures such as CPUE, density, abundance

– Reduces detail of final model• For these tests: 9km

Page 15: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Histograms of Predictor Layers

0123456789

10

0.00

0.04

0.08

0.12

0.16

0.20

0.23

0.27

0.31

0.35

0.39

0.43

0.47

0.51

0.55

0.59

0.62

0.66

0.70

0.74

ln(N

umbe

r of

Pix

els+

1)

Axis Title

Density of Infrastructure

0

1

2

3

4

5

6

7

8

310

598

885

1,17

31,46

11,74

92,03

72,32

42,61

22,90

03,18

83,47

53,76

34,05

14,33

94,62

74,91

45,20

25,49

05,77

86,06

56,35

3

ln(N

umbe

r of P

ixel

s+1)

NPP (mgc per meter^2 per day)

Net Primary Productivity Histogram

0

1

2

3

4

5

6

7

8

-16

-171

-326

-481

-636

-791

-946

-1,101

-1,256

-1,411

-1,566

-1,721

-1,876

-2,031

-2,186

-2,341

-2,496

-2,651

-2,806

-2,961

-3,116

-3,271

-3,426

-3,581

-3,736

-3,891

ln(N

umbe

r of

Pix

els+

1)

Depth

Bathymetry Histogram

Page 16: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Predictor Layer Uncertainty• Noise: Minimal• Correlation: Eliminate NPP• Spatial Errors: Unknown, < 9km?• Measurement errors: Minimal to

Unknown

Spearman’s CoefficientNPP to SST

Correlation: 0.99

Page 17: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Modeling Software: Maxent

• Popular, relatively easy to use• Only requires presence points• Known for over-fitting• Performs “piece-wise” regression• Assumes:

– Predictors are error free– Independence of errors in response– Lack of correlation in predictors– Constant variance– Random data collection over sample area

Page 18: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Experiments

• Different regularization multipliers– Select regularization for other tests

• Try different numbers of samples– Determine the number of samples to use

• “Jiggle” the sample data spatially– Introduce different amounts of spatial error

Page 19: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Regularization=0.1

Page 20: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Regularization=1.2

Page 22: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

0.937

0.938

0.939

0.94

0.941

0.942

0.943

0.944

0.945

0.946

0.947

70500

71000

71500

72000

72500

73000

73500

74000

74500

0.1 1.2 2.3 3.4 4.5 5.6 6.7 7.8 8.9 10

AUC Values

AIC,

AIC

C, B

IC V

alue

s

Regularization Values

Regularization ResultsAIC AICC BIC AUC

Regularization=0.1 Regularization=1.2 Regularization=10

Page 23: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Percent Number of Samples Number of Parameters AIC AUC

10% 665 136 7549 0.947

20% 1330 166 13,258 0.941

30% 1995 120 16,850 0.936

50% 3326 159 23,839 0.929

70% 4656 162 28,293 0.924

100% 6651 81 72,901 0.918

0

10000

20000

30000

40000

50000

60000

70000

80000

0.90.9050.91

0.9150.92

0.9250.93

0.9350.94

0.9450.95

10 20 30 50 70 100

AIC

AUC

Percent of Total Points

Impact of Sample Points on Performance Measures

AUC AIC

Page 24: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

10%SamplePoints

100%Sample

Points

PoorHabitat

OptimalHabitat

Page 25: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

No Jiggling

Std Dev=4.4km

Std Dev=55km

Jiggling

Page 26: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Jiggling

• Spearman’s Correlation

0 vs. 4.4km, Correlation = .997 0 vs. 55km, Correlation = .919

Page 27: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Uncertainty Maps

• Standard Deviation of Jiggling Points by 4.4km

0.0008 0.32

Page 28: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Conclusions

• We can improve HSMs over the default settings in Maxent

• We can determine uncertainty in some cases

• Next Steps– Support uncertainty for each data point– Improve uncertainty analysis and

visualization for predictor layers– Begin adding uncertainty analysis and maps

to products

Page 29: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Communicating Uncertainy

Colleen Sullivan, 2012

Page 30: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

References• Anderson, Model Based Inference in the Life Sciences: A Primer on

Evidence• AVHRR Technical Specifications: http

://data.nodc.noaa.gov/pathfinder/Version5.2/GDS_TechSpecs_v2.0.pdf• Chil’s JP, Delfiner P , Geostatistics: Modeling Spatial Uncertainty • Franklin J, Mapping Species Distributions: Spatial Inference and

Prediction. Cambridge University Press, Cambridge• Hunsaker CT et. al., Spatial Uncertainty in Ecology: Implications for

Remote Sensing and GIS Applications • Maxent: http://www.cs.princeton.edu/~schapire/maxent/• Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy

modeling of species geographic distributions. Ecol Model 190 (3-4):231-259.

• Warren DL, Seifert SN (2011) Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. Ecological Applications 21 (2):335-342.

• Wenzhong S, Principles of Modeling Uncertainties in Spatial Data and Spatial Analyses

Page 31: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Other Issues…

• Selecting background (absence points)• Clamping:

– Prevents prediction outside the predictors used to create the model

• Thresholds

Page 33: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Communicating Uncertainy

Colleen Sullivan, 2012

Page 34: Uncertainty How “certain” of the data are we? How much “error” does it contain? How well does the model match reality? Goal: –Understand and document uncertainties

Staircase of Knowledge

Increasing Subjectivity

Hum

an v

alue

add

ed

ObservationAnd

Measurement

Data

Information

Knowledge

Understanding

Wisdom

OrganizationInterpretation

Verification

SelectionTesting

ComprehensionIntegration

Judgment

Environmental Monitoring and Characterization, Aritola, Pepper, and Brusseau