Upload
tamsyn-kelly
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Basic Statistical Concepts
M. Burgman & J. Carey 2002
Statistical Population
• The entire underlying set of individuals from which samples are drawn.
e.g. 0.25m2 quadrats are used to count barnacles on a sea shore.
• The population is defined implicitly by the sampling frame.
Strategies
• Define survey objectives
• Define population parameters to estimate
• Implement sampling strategy
i) measure every individual (cost, time, practicality especially if destructive)
ii) measure a representative portion of the population (a sample)
Statistical Sample
• An aggregate of objects from which measurements are taken.
• A representative subset of a population.
Simple Random Sampling
• Every unit and combination of units in the population has an equal chance of selection.
a) with replacement
b) without replacement
c) finite and infinite populations
Sampling Objectives
• To obtain an unbiased estimate of a population mean
• To assess the precision of the estimate (i.e. calculate the standard error of the mean)
• To obtain as precise an estimate of the parameters as possible for time and money spent
(xi - )2
n
(xi - x)2
n - 1
(xi - x)2
n - 1
Statistics of Dispersion
Population variance 2 =
Sample variance s2 =
Sample standard deviation s =
s2
n
s x
(xi - x ) (yi - y ) n - 1
Statistics of Dispersion
Standard error of the mean sx =
Coefficient of variation CV =
Covariance sxy =
Expectations and Variances
E(X+b) = E(X) + b
E(aX) = aE(X)
E(X+Y) = E(X) + E(Y)
V(X+b) = V(X)
V(aX) = a2V(X)
V(X+Y) = V(X) + V(Y) + 2Cov(X,Y)
Confidence Limits
• For the mean = x t[, n-1]
• This formula sets confidence limits to means of samples from a normally distributed population.
sn
Confidence Limits
• Confidence limits of the mean define a region that we expect will enclose the true mean.
• The likelihood that this is true is determined by . If we set at 5% (hence specifying 95% confidence intervals), then the region enclosed by the confidence intervals will capture the true mean 95 times out of 100.
Confidence Limits
• The same formula may be used to set confidence limits to any statistic as long as it follows the normal distribution,
e.g. the median,
the average (absolute) deviation, standard deviation (s),
coefficient of variation, or
skewness.
How many samples?
where :• CV is coefficient of variation (expressed as a
%) of samples in a pilot survey• t is Student's t value for a specified degree of
certainty and the number of samples used to estimate the parameters
• E is specified error limits (expressed as a % of the mean)
n
n = t2 CV2
E2
Measurement Error
• Measured variation may be decomposed into
natural variation + measurement error
• Measurement error may be reduced by improving sampling protocols and instrumentation
• Reducing measurement error increases confidence in estimates without increasing the number of samples.
• Precision (variation) v. accuracy (bias)
Components of Measurement Error
• Systematic errors• Random errors
Causes
• Measurement assumptions
(shape, size, allometry)
• Instrument error
• Operator error
Kinds of Uncertainty
1. Epistemic Uncertainty
• inherent environmental variation
• variation in population responses due to demographic structure
• imperfect knowledge
• model mis-specification
• measurement error (assessment error)
• ignorance
Kinds of Uncertainty
2. Semantic Uncertainty
• Ambiguity - interpretation of a phrase in two or more distinct ways.
“Juvenile Court to Try Shooting Defendant”
“Local High School Dropouts Cut in Half”
• Vagueness - leads to borderline cases.
e.g. tall; endangered; adult
Kinds of Uncertainty
More examples of vagueness:
• Tree crown
tree foliage bounded by the first healthy branch forming part of the main crown and extending as far or further than any branch above it.
forked trees? dead branches?
Kinds of Uncertainty
More examples of vagueness:
• Epilimnion
the upper layer of water in a lake, bounded by a thermocline
• Soil horizon
a relatively uniform soil layer, differentiated by contrasts in mineral or organic properties.
Sampling Design Criteria
• Operational simplicity
• Unambiguous interpretation
Null-Hypothesis Tests
An example of hypothesis testing in which management alternatives are judged on the basis of the outcome of the test.
Hypothesis Symbol Description
Null H0 The strategy has no
hypothesis effect.
Alternative H1 The strategy is hypothesis effective
Statistical Outcomes in Null Hypothesis Testing
Test Result
Significant Not significant (H0 rejected) (H0 not rejected)
Difference correct Type II error (H0 false) ()
No difference Type I error correct (H0 true) ()
Reality
The Character of Error TypesType I errors• Alarmism/Over-reaction• Incorrectly accepting a (false) alternative
hypothesis• Concluding (incorrectly) that there is an impact
Type II errors• False confidence/Cornucopia• Incorrectly "accepting" a (false) null hypothesis• Concluding (incorrectly) that there is no impact
t-tests
A t-test of the hypothesis that two sample means come from a population with equal
i.e. H0: 1= 2
t = Y1 - Y2
1n
(s12 + s2
2)
Distributions of Test Statistics
distribution of mean of actual population
distribution of the null hypothesis, assumed
to be true until rejected
P(s
tatis
tic)
critical value
Assumptions
The assumption of independence: correlation and autocorrelation
1. if error in one object is related to error in others, there will be bias eg. measure one and compare others.
2. the effective sample size may be less than the number of samples if measurements are correlated in space or time.
The effects of the non-independence of data on errors of interpretation of statistical tests
Non-independence
Among Within treatments treatments
Positive Increased Increased Type II Type I
Negative Increased Increased Type I Type II
Correlation
Randomization Tests
Jaw lengths of Golden Jackals:
Males: 120, 107, 110, 116, 114,
111, 113, 117, 114, 112
Females: 110, 111, 107, 108, 110,
105, 107, 106, 111, 111
Is there a difference in jaw length between males and females?
1.Calculate means for males and for females.
2.Calculate the difference between the means D0 = xm - xf = 4.8
3.Randomly allocate 10 sample lengths to each of 2 groups
4.Calculate Di , the difference between means for these 2 groups
5.Repeat Steps 3 & 4 many times
Randomization Tests
• If D0 is unusually large, the observed data are unlikely to have arisen if there was no difference between males and females.
Randomization Tests
-4 0 2 4Difference in jaw length (mm)
0
200
400
600
Fre
quen
cy
D0 = 4.8
-2
Randomization Tests
• From 5000 runs,
only 9 Dis were greater than or equal to 4.8.
• 9/5000 = 0.0018.
(t-test: pHo = 0.0013)
Confidence Limits by Randomization
• For 95% confidence limits, the upper and lower limits, U and L, are such that they enclose 95% of the randomization distribution.
• For 99% confidence, L and U must give values at the 0.5% and 99.5% points on the distribution.
Can do randomization tests in lieu of:• paired comparisons• ANOVA• multiple regression
Randomization Tests