Inference in Biology

Inference in Biology

BIOL4062/5062

Hal Whitehead

• What are we trying to do?

• Null Hypothesis Significance Testing

• Problems with Null Hypothesis Significance Testing

• Alternatives:– Displays, confidence intervals, effect size statistics– Model comparison using information-theoretic

approaches– Bayesian analysis

• Methods of Inference in Biology

What are we trying to do?

• Descriptive or exploratory analyses

• Fitting predictive models

• Challenging research hypotheses

What are we trying to do?

• Descriptive or exploratory analyses

– What factors influence species diversity?

• Fitting predictive models

– Can we make global maps of species diversity?

• Challenging research hypotheses

– Is diversity inversely related to latitude?

The traditional approach:Null Hypothesis Significance Testing• Formulate null hypothesis• Formulate alternative hypothesis• Decide on test statistic• Collect data• What is probability (P) of test statistic, or more extreme value,

under null hypothesis?• If P<α (usually 0.05) conclude:

– Reject null in favour of alternative

• If P>α conclude:– Do not reject null hypothesis

Null Hypothesis Significance TestingAn example

• Formulate null hypothesis– “Species diversity does not change with latitude”

• Formulate alternative hypothesis– “Species diversity decreases with latitude”

• Decide on test statistic– Correlation between diversity measure and latitude, r

• Collect data– 405 measures of diversity at different latitudes

• What is probability (P) of test statistic, or more extreme value, under null hypothesis?– r = -0.1762; P = 0.002 (one-sided)

• If P<α (usually 0.05) conclude:– Reject “Species diversity does not change with latitude”

Criticisms of: Null Hypothesis Significance Testing (1)

• α is arbitrary• Most null hypotheses are false, so why test them?• Statistical significance is not equivalent to biological

significance– with large samples, statistical significance but not biological

significance

– with small samples, biological significance but not statistical significance

• If statistical power is low, the null hypothesis will usually not be rejected when false

• Encourages arbitrary inferences when many tests carried out

Criticisms of: Null Hypothesis Significance Testing (2)

• Power analysis does not save NHST– arbitrary, confounded with P-value– “vacuous intellectual game” (Shaver 1993)

• Incomplete reporting and publishing– only report statistically significant results– only publish statistically significant results

• Focussing on one null and one alternative hypothesis limits scientific advance

• Emphasis on falsification obscures uncertainty about “best” explanation for phenomenon

Misuse of: Null Hypothesis Significance Testing

• Failure to reject null hypothesis does not imply null is true

• Probability of obtaining data given null hypothesis is not probability null hypothesis is true

• Poor support for null hypothesis does not imply alternative hypothesis is true

Practicalimportance ofobserveddifference

Statistical significance

Not significant Significant

Not important Happy Annoyed

Important Very sad Elated

Johnson (1999) “The insignificance of statistical significance testing”J. Wild. Manage.

Johnson (1999) “The insignificance of statistical significance testing”J. Wild. Manage.

Practicalimportance ofobserveddifference

Statistical significance

Not significant Significant

Not important n OK n too large

Important n too small n OK

Null Hypothesis Significance Testing:• “no longer a sound or fruitful basis for statistical

investigation” (Clarke 1963)

• “essential mindlessness in the conduct of research” (Bakan 1966)

• “In practice, of course, tests of significance are not taken seriously” (Guttman 1985)

• “simple P-values are not now used by the best statisticians” (Barnard 1998)

• “The most common and flagrant misuse of statistics... is the testing of hypotheses, especially the vast majority of them known beforehand to be false” (Johnson 1999)

“The problems with Null Hypothesis Significance Testing

are so severe that some have argued for it to be completely

banned from scholarly journals”

Denis (2003) Theory & Science

• Displays, confidence intervals, effect size statistics

• Model comparison using information-theoretic approaches

• Bayesian statistics

Alternatives to:Null Hypothesis Significance

Testing

Diversity and latitude

• r = -0.1762; P = 0.002

• r = -0.1762; 95% c.i.: -0.2690; -0.0801

0 10 20 30 40 50 60 701

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

Latitude

Div

ersi

ty 95%c.i.

Diversity and latitude:Maybe by focussing on the diversity-latitude

hypothesis, we have missed the real story

0 10 20 30 40 50 60 701

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

Latitude

Div

ersi

ty

5 10 15 20 25 30 351

1.5

2

2.5

3

3.5

4

4.5

SST

Div

ersi

ty

Atlantic

Pacific

5 10 15 20 25 30 350.5

1

1.5

2

2.5

3

3.5

4

4.5

5

SST

Div

ersi

ty

Galápagos

Gully

Other

5 10 15 20 25 30 351

1.5

2

2.5

3

3.5

4

SST

Div

ersi

ty

Effect Size Statistics• indicate the association that exists between two or more variables

– Pearson’s r correlation coefficient (or r2)• for two continuous variables

– Cohen’s d• for one continuous, one two-level category (t-test)

– Hedges’ g• better than d when sample sizes are very different

– Cohen’s f2

• for one continuous, one multi-level category (F-test)

– Cramer’s φ• for two categorical variables (Chi2 test)

– Odds ratio• for two binary variables

Cohen’s d

• d = 0.2 indicative of a small effect size

• d = 0.5 a medium effect size

• d = 0.8 a large effect size

d = Difference between means of two groups Pooled standard deviation

Problems with effect size statistics

• No serious problems

• But they don’t tell the whole story

Model fitting:How can we best predict diversity?

0 10 20 30 40 50 60 701

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

Latitude

Div

ersi

ty

5 10 15 20 25 30 351

1.5

2

2.5

3

3.5

4

4.5

SST

Div

ersi

ty

Atlantic

Pacific

5 10 15 20 25 30 350.5

1

1.5

2

2.5

3

3.5

4

4.5

5

SST

Div

ersi

ty

Galápagos

Gully

Other

5 10 15 20 25 30 351

1.5

2

2.5

3

3.5

4

SST

Div

ersi

ty

Some models of diversity

constant

SST

SST, SST2

SST, SST2, SST3

lat

lat, lat2

lat, lat2, lat3

SST, SST2, lat

SST, SST2, lat, lat2

SST, SST2, lat, lat2, lat3

ocean

SST, SST2, ocean

area

SST, SST2, area

SST = Sea Surface Temperature lat = LatitudeOcean = Atlantic /Pacific area = Ocean area (categorical)

Which model is best?Model: Residual sum of squares Parameters

constant 0.854 2

SST 0.774 3

SST, SST2 0.724 4

SST, SST2, SST3 0.726 5

lat 0.835 3

lat, lat2 0.804 4

lat, lat2, lat3 0.785 5

SST, SST2, lat 0.725 5

SST, SST2, lat, lat2 0.722 6

SST, SST2, lat, lat2, lat3 0.724 7

ocean 0.844 3

SST, SST2, ocean 0.725 5

area 0.831 4

SST, SST2, area 0.723 6

Lowest RSSbut many

parameters

Which model is best?

• Information-theoretic AIC– Akaike Information Criterion

• A measure of the similarity between the statistical model and the true distribution

• Trades off the complexity of a model against how well it fits the data

Which model is best? Model: RSS Parameters AIC

constant 0.854 2 -61.08

SST 0.774 3 -99.81

SST, SST2 0.724 4 -125.54

SST, SST2, SST3 0.726 5 -123.64

lat 0.835 3 -69.09

lat, lat2 0.804 4 -83.19

lat, lat2, lat3 0.785 5 -92.19

SST, SST2, lat 0.725 5 -124.10

SST, SST2, lat, lat2 0.722 6 -125.05

SST, SST2, lat, lat2, lat3 0.724 7 -123.05

ocean 0.844 3 -64.88

SST, SST2, ocean 0.725 5 -124.27

area 0.831 4 -69.77

SST, SST2, area 0.723 6 -124.59

Lowest AIC:Best Model

How much support for different models? Model: AIC ΔAIC

constant -61.08 64.46

SST -99.81 25.73

SST, SST2 -125.54 0.00

SST, SST2, SST3 -123.64 1.90

lat -69.09 56.45

lat, lat2 -83.19 42.35

lat, lat2, lat3 -92.19 33.35

SST, SST2, lat -124.10 1.45

SST, SST2, lat, lat2 -125.05 0.49

SST, SST2, lat, lat2, lat3 -123.05 2.49

ocean -64.88 60.66

SST, SST2, ocean -124.27 1.27

area -69.77 55.77

SST, SST2, area -124.59 0.96

How much support for different models? Model: AIC ΔAIC

constant -61.08 64.46 No support

SST -99.81 25.73 No support

SST, SST2 -125.54 0.00 Best model

SST, SST2, SST3 -123.64 1.90 Some support

lat -69.09 56.45 No support

lat, lat2 -83.19 42.35 No support

lat, lat2, lat3 -92.19 33.35 No support

SST, SST2, lat -124.10 1.45 Some support

SST, SST2, lat, lat2 -125.05 0.49 Some support

SST, SST2, lat, lat2, lat3 -123.05 2.49 Little support

ocean -64.88 60.66 No support

SST, SST2, ocean -124.27 1.27 Some support

area -69.77 55.77 No support

SST, SST2, area -124.59 0.96 Some support

Relative importance of variablesfrom AIC

SST 1.000

SST2 1.000

SST3 0.211

lat 0.398

lat2 0.280

lat3 0.075

ocean 0.128

area 0.141

Best model of diversity:Diversity = 0.293 + 0.261SST - 0.00614SST2

5 10 15 20 25 30 351

1.5

2

2.5

3

3.5

4

SST

Div

ersi

ty

Global pattern of diversityapply equation to global SST map

Global pattern of diversityapply equation to SST predictions

from global circulation models

Advantages and criticisms of information-theoretic model-fitting• Indicates “best” model and

support for other models

• Can compare very different models

• Balances complexity of model against fit

• Produces predictive models

• Fairly simple mathematically and computationally

• Model averaging

• Philosophical basis “nuanced”

• Which models to consider is subjective

Bayesian Analysis

• Given prior distribution of models or model parameters

• Collect data

• Work out probability of data for each model and combination of model parameters

• Work out posterior distribution of models or model parameters– using Bayes’ theorem

Bayes’ Theorem

Posterior probability of model given data =

Probability of data given model X Probability of model

Probability of data

Bayesian Analysis

• So, Bayesian analysis gives:

– the probability of models or parameters given prior knowledge and data

– very nice!– but may need considerable computation

Example of Bayesian Analysis

• Trying to work out survival rate of newly studied species of rodent

• Ten other species in genus have mean survival per year of 0.72 (SD 0.13)

• Of 20 animals marked, 17 survive for 1 year

• Standard (binomial) estimate of survival = 0.850 (95% c.i. 0.621 - 0.968)

• Bayesian estimate of survival = 0.797 (95% c.i. 0.637 - 0.921)

Advantages and Difficulties with Bayesian Analysis

• Philosophically very nice• Gives probability of

model given data and prior information

• Updates estimates as more information becomes available

• Does not give biologically implausible estimates– e.g. survival >1

• Fits adaptive management paradigm

• Choice of priors somewhat arbitrary

• Bayesian analysis with “uninformative priors” gives similar results to simpler methods

• Complex• Computation can be

VERY time consuming and opaque

Methods of Inference in Biology• Descriptive or exploratory analyses

– Displays, confidence intervals, effect size statistics– Model comparisons using AIC, etc– Bayesian analysis (if prior information)– Null hypothesis significance tests?

• Fitting predictive models– Model comparisons using AIC, etc– Bayesian analysis (if prior information)

• Challenging research hypotheses– Model comparisons using AIC, etc– Null hypothesis significance tests

This class

• Displays, confidence intervals, effect size statistics ***

• Model comparisons using AIC, etc **• Bayesian analysis• Null hypothesis significance tests *

Documents

Inference in Biology