Upload
moses-walsh
View
223
Download
0
Tags:
Embed Size (px)
Citation preview
Developing O/E(Observed-to-Expected) Models for Assessing Biological Condition
Chuck HawkinsWestern Center for Monitoring and Assessment of
Freshwater EcosystemsUtah State University
11 May 2006National Water Quality Monitoring Council
5th National Monitoring Conference San Jose, California
Content of Short Course
• O/E as a concept.• E: simple idea, not so easy to estimate.• Sampling, probabilities of capture, and E.• E as a function of taxon-specific
probabilities of capture.• Predicting E: it only hurts for a while!• Errors, inferences, and two types of
assessments.• O/E and the WSA – understanding the
numbers.
What is O/E? O/E is a measure of the taxonomic
completeness of the biological community observed at a site
E = 8 taxa O = 3 taxa
O/E0.38
E: simple idea, not so easy to estimate:Accurately and precisely describing the biota
expected in different waterbodies.
Hypothetical Variation in Probabilities of Capture of Several
Taxa Along One Natural Environmental Gradient
TemperaturePro
babili
ty o
f C
aptu
re
The challenge is compounded because taxa pc’s are simultaneously controlled by several natural factors.
A Segue into the Messy Issue of Probabilities
• Sampling means uncertainty!
• Sampling error, probabilities of capture, and E.
Sampling Effort
Num
ber
of
Taxa
Field
Sam
ple
CompleteCensus
Lab s
ub
-sam
ple The actual composition
and number of taxa in any given sample will have a random component.
E as a function of taxon-specific probabilities of capture
Although E = 8 taxa is a true statement, the picture of distinct composition is misleading.
The real composition associated with E is actually a bit fuzzy.
PC = 1 PC = 0.8 PC = 0.5 PC = 0.2
Sampling and Probabilities of Capture
Taxon
Replicate Sample Number Freq
(pc)1 2 3 4 5 6 7 8 9 10
Baetis * * * * * * * * * * 1.0
Perla * * * * * * * * 0.8
Corixa * * * * * 0.5
Drunella * * * * * 0.5
Epeorus * 0.1
Sp Count 3 3 3 2 4 3 2 2 4 3 2.9
E = ∑ pc = number of taxa / sample = 2.9.
How O/E is Calculated:
Sum of taxa pc’s estimates the number of taxa (E) that should be observed given standard sampling.
Taxon pc O
Atherix 0.92 ●
Baetis 0.86 ●
Caenis 0.70
Drunella 0.63
Epeorus 0.51 ●
Farula 0.32
Gyrinus 0.07
Hyalella 0.00
E 4.01 3
O/E = 3 / 4.01 = 0.75
O2 O3
●
●
● ●
● ●
●
3 3
Predicting E:it only hurts for a while!
• Two basic approaches:– Model many individual species (logistic
regression models) and then combine the many predictions.
– Model a few assemblage types and then ‘back out’ probabilities of capture for individual species.
– We do the latter.
Yes, explaining how E is predicted can be a little
complicated.
“In layman’s terms?
I’m afraid I don’t know any layman’s terms.”
Mc2 x R2
6.673×10-11 m3 kg-1 s-2E =
The basic approach to estimating pc’s from predictions of assemblage type was
worked out several years ago.
Moss, D., M. T. Furse, J. F. Wright, and P. D. Armitage. 1987. The prediction of the macro-invertebrate fauna of unpolluted running-water sites in Great Britain using environmental data. Freshwater Biology 17:41-52.
Empirical modeling that derives predictions of the probability of capturing a species at a new location from observations at ‘reference’ sites.
A primer is on our web page:www.cnr.usu.edu/wmc
Three Major Stepsin Estimating E
1. Classify reference sites based on their biological similarity.
2. Predict the class of a new site from environmental attributes with a discriminant functions model.
3. Weight frequencies of occurrences of taxa within classes by the site’s probabilities of class membership to estimate pc’s and then E.
Classifying Reference Sites(sites within classes are seldom spatially clustered)
AZ_56_RefSites_logAbundance
Distance (Objective Function)
Information Remaining (%)
3.8E-02
100
2.7E+00
75
5.4E+00
50
8.1E+00
25
1.1E+01
0
R1R15R29R37R18R22R16R28R17R49R19R25R27R34R36R55R56R20R42R21R41R43R2R50R47R3R45R52R51R53R54R10R38R6R11R12R33R13R39R14R8R5R7R23R4R24R31R30R35R48R26R44R32R9R40R46
Cool Water, High Elevation
Warm Water, Low Elevation
Limestone Watershed
Granite
Granite
Limestone
Spring
Stream
Small
Medium
Small
Big
Medium
A
B
C
D
E
FG
H
Environmental features associated with biologically defined classes.
Hydropsyche 100%Caenis 95%Baetis 90%Tricorythodes 80%Drunella grandis 70%
Baetis 100%Drunella grandis 85%Arctopsyche 80%Neophylax 75%Optioservus 70%
Baetis 95%Epeorus 90%Simulium 90%Arctopsyche 75%Zapada 70%
We could use these numbers to estimate E at a new site belonging to one of these stream ‘types’, but…what if the site is ‘medium-big’, etc.?
A
B
C
Small
Big
Medium
Granite
Class A
Class B
Class C
Class D
DiscriminantAnalysis
Biologically DefinedReference Classes:
DiscriminantModel
Reference SitePredictor Variables:
Catchment AreaGeologyLatitute
LongitudeElevation
etc.
Discriminant Functions Models Classify New Sites in Terms of Their Probabilities
of Class Membership
DiscriminantModel
PredictorVariables
Values
A 0.5 0.6 0.30B 0.4 0.2 0.08C 0.1 0.0 0.00D 0.0 0.0 0.00Probability of Taxon Being in a Sampleif the Site is in Reference Condition =
0.38
Frequency
of Taxonin Class
Probability of Class
Membership
By Weighting Taxon Frequencies of Occurrence within a Class by the
Probabilities of Class Membership, We Can Estimate Individual Taxon
Probabilities of Capture
ClassContributio
nto PC
The model estimates the pc’s of every taxon (i.e., those observed in at least one reference site sample) at every assessed site, not just a few as shown here for illustration.
Also, if a taxon is predicted to have a pc of zero, it does not contribute to O!
O/E = 3 / 4.07 = 0.74
Taxon pc O
Atherix 0.70 *
Baetis 0.92 *
Caenis 0.86
Drunella 0.63
Epeorus 0.51 *
Farula 0.38
Gyrinus 0.07
Hyalella 0.00 *
E 4.07 3
Errors, inferences, and two types of assessments.
• Model error.
• Inferring site condition.
• Inferring regional conditions.
Statistical Issues Regarding Inferences of Impairment
(Single Samples)• Statistical Hypothesis: Is the observed
O/E value for a single sample from the same distribution of values estimated for reference sites, i.e., the site is either equivalent to reference or not.
We should ideally set a threshold value to balance Type I and II errors. Easy to set Type I error, but Type II errors are problematic. 10th and 90th percentiles of reference site values have been used.
How Good can a Model Be?• SD of O/E values calculated at reference
quality sites is a measure of overall model error.– Part sampling error– Part prediction error (random and systematic)
• A model can be no more precise than random sampling error.
• A model should be no worse than a null model – i.e., assume all sites have similar biota.
For Regional Assessments, We Want to Compare the Distribution of Observed O/E Values Among
Sites with the Expected Distribution
1
O/E
Expected if All Sites are in Reference Condition
ActualDistribution
35%
40%
25%
StreamMiles
Fair
PoorRef
Statistical Issues Regarding Inferences of Impairment
(Multiple Sites andReplicated Samples at a Site)
• Statistical Hypothesis: is the observed mean different from 1 (the reference mean)? This test allows us to ask questions regarding how impaired a site or population of sites is. Sensitivity of the test is a function of model precision and sample size. Methods for balancing Type I and II errors are well worked out. Replicate samples at a site allow estimation of confidence limits around estimates of O/E.
O/E and the WSA –understanding the numbers.
• WSA reference sites
• How many models?
• Reference site classes
• Model predictors• Model
performance• Assessment
results
Great variability in geographic distribution of sites within classes.
Western Model used 30 classes of streams for
modeling
Graphic courtesy of Pete Ode.
Predictor Variables
West Plains E. Highlands
Longitude --- Longitude
Elevation Elevation ---
Day of Year Day of Year Day of Year
Basin Area Basin Area Basin Area
Stream Slope Stream Slope ---
Air Temperature Freeze-Free Days Air Temperature
Log Precipitation Log Precipitation Wet Days
Model Performance
Validation West Plains E. Highlands
Mean 0.99 0.95 0.99
SD (model) 0.20 0.24 0.18
SD (null) 0.26 0.30 0.22
Test Sites
Mean 0.84 0.86 0.81
Biodiversity status of the Nation’s streams as measured by O/E.
Data summarized as % of stream miles in each of 4 O/E classes.
Concluding Remarks
• O/E has an intuitive biological meaning.
• It means the same thing everywhere.
• Its derivation and interpretation are independent of type and knowledge of stressors in the region.
• It is quantitative, but….