Basic Statistical Concepts M. Burgman & J. Carey 2002

Basic Statistical Concepts

M. Burgman & J. Carey 2002

Statistical Population

• The entire underlying set of individuals from which samples are drawn.

e.g. 0.25m2 quadrats are used to count barnacles on a sea shore.

• The population is defined implicitly by the sampling frame.

Strategies

• Define survey objectives

• Define population parameters to estimate

• Implement sampling strategy

i) measure every individual (cost, time, practicality especially if destructive)

ii) measure a representative portion of the population (a sample)

Statistical Sample

• An aggregate of objects from which measurements are taken.

• A representative subset of a population.

Simple Random Sampling

• Every unit and combination of units in the population has an equal chance of selection.

a) with replacement

b) without replacement

c) finite and infinite populations

Sampling Objectives

• To obtain an unbiased estimate of a population mean

• To assess the precision of the estimate (i.e. calculate the standard error of the mean)

• To obtain as precise an estimate of the parameters as possible for time and money spent

(xi - )2

n

(xi - x)2

n - 1

(xi - x)2

n - 1

Statistics of Dispersion

Population variance 2 =

Sample variance s2 =

Sample standard deviation s =

s2

n

s x

(xi - x ) (yi - y ) n - 1

Statistics of Dispersion

Standard error of the mean sx =

Coefficient of variation CV =

Covariance sxy =

Expectations and Variances

E(X+b) = E(X) + b

E(aX) = aE(X)

E(X+Y) = E(X) + E(Y)

V(X+b) = V(X)

V(aX) = a2V(X)

V(X+Y) = V(X) + V(Y) + 2Cov(X,Y)

Confidence Limits

• For the mean = x t[, n-1]

• This formula sets confidence limits to means of samples from a normally distributed population.

sn

Confidence Limits

• Confidence limits of the mean define a region that we expect will enclose the true mean.

• The likelihood that this is true is determined by . If we set at 5% (hence specifying 95% confidence intervals), then the region enclosed by the confidence intervals will capture the true mean 95 times out of 100.

Confidence Limits

• The same formula may be used to set confidence limits to any statistic as long as it follows the normal distribution,

e.g. the median,

the average (absolute) deviation, standard deviation (s),

coefficient of variation, or

skewness.

How many samples?

where :• CV is coefficient of variation (expressed as a

%) of samples in a pilot survey• t is Student's t value for a specified degree of

certainty and the number of samples used to estimate the parameters

• E is specified error limits (expressed as a % of the mean)

n

n = t2 CV2

E2

Measurement Error

• Measured variation may be decomposed into

natural variation + measurement error

• Measurement error may be reduced by improving sampling protocols and instrumentation

• Reducing measurement error increases confidence in estimates without increasing the number of samples.

• Precision (variation) v. accuracy (bias)

Components of Measurement Error

• Systematic errors• Random errors

Causes

• Measurement assumptions

(shape, size, allometry)

• Instrument error

• Operator error

Kinds of Uncertainty

1. Epistemic Uncertainty

• inherent environmental variation

• variation in population responses due to demographic structure

• imperfect knowledge

• model mis-specification

• measurement error (assessment error)

• ignorance


2. Semantic Uncertainty

• Ambiguity - interpretation of a phrase in two or more distinct ways.

“Juvenile Court to Try Shooting Defendant”

“Local High School Dropouts Cut in Half”

• Vagueness - leads to borderline cases.

e.g. tall; endangered; adult


More examples of vagueness:

• Tree crown

tree foliage bounded by the first healthy branch forming part of the main crown and extending as far or further than any branch above it.

forked trees? dead branches?


More examples of vagueness:

• Epilimnion

the upper layer of water in a lake, bounded by a thermocline

• Soil horizon

a relatively uniform soil layer, differentiated by contrasts in mineral or organic properties.

Sampling Design Criteria

• Operational simplicity

• Unambiguous interpretation

Null-Hypothesis Tests

An example of hypothesis testing in which management alternatives are judged on the basis of the outcome of the test.

Hypothesis Symbol Description

Null H0 The strategy has no

hypothesis effect.

Alternative H1 The strategy is hypothesis effective

Statistical Outcomes in Null Hypothesis Testing

Test Result

Significant Not significant (H0 rejected) (H0 not rejected)

Difference correct Type II error (H0 false) ()

No difference Type I error correct (H0 true) ()

Reality

The Character of Error TypesType I errors• Alarmism/Over-reaction• Incorrectly accepting a (false) alternative

hypothesis• Concluding (incorrectly) that there is an impact

Type II errors• False confidence/Cornucopia• Incorrectly "accepting" a (false) null hypothesis• Concluding (incorrectly) that there is no impact

t-tests

A t-test of the hypothesis that two sample means come from a population with equal

i.e. H0: 1= 2

t = Y1 - Y2

1n

(s12 + s2

2)

Distributions of Test Statistics

distribution of mean of actual population

distribution of the null hypothesis, assumed

to be true until rejected

P(s

tatis

tic)

critical value

Assumptions

The assumption of independence: correlation and autocorrelation

1. if error in one object is related to error in others, there will be bias eg. measure one and compare others.

2. the effective sample size may be less than the number of samples if measurements are correlated in space or time.

The effects of the non-independence of data on errors of interpretation of statistical tests

Non-independence

Among Within treatments treatments

Positive Increased Increased Type II Type I

Negative Increased Increased Type I Type II

Correlation

Randomization Tests

Jaw lengths of Golden Jackals:

Males: 120, 107, 110, 116, 114,

111, 113, 117, 114, 112

Females: 110, 111, 107, 108, 110,

105, 107, 106, 111, 111

Is there a difference in jaw length between males and females?

1.Calculate means for males and for females.

2.Calculate the difference between the means D0 = xm - xf = 4.8

3.Randomly allocate 10 sample lengths to each of 2 groups

4.Calculate Di , the difference between means for these 2 groups

5.Repeat Steps 3 & 4 many times

Randomization Tests

• If D0 is unusually large, the observed data are unlikely to have arisen if there was no difference between males and females.

Randomization Tests

-4 0 2 4Difference in jaw length (mm)

0

200

400

600

Fre

quen

cy

D0 = 4.8

-2

Randomization Tests

• From 5000 runs,

only 9 Dis were greater than or equal to 4.8.

• 9/5000 = 0.0018.

(t-test: pHo = 0.0013)

Confidence Limits by Randomization

• For 95% confidence limits, the upper and lower limits, U and L, are such that they enclose 95% of the randomization distribution.

• For 99% confidence, L and U must give values at the 0.5% and 99.5% points on the distribution.

Can do randomization tests in lieu of:• paired comparisons• ANOVA• multiple regression

Randomization Tests

Documents

Basic Statistical Concepts M. Burgman & J. Carey 2002