Upload
ifama
View
34
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Inference in Biology. BIOL4062/5062 Hal Whitehead. What are we trying to do? Null Hypothesis Significance Testing Problems with Null Hypothesis Significance Testing Alternatives: Displays, confidence intervals, effect size statistics Model comparison using information-theoretic approaches - PowerPoint PPT Presentation
Citation preview
Inference in Biology
BIOL4062/5062
Hal Whitehead
• What are we trying to do?
• Null Hypothesis Significance Testing
• Problems with Null Hypothesis Significance Testing
• Alternatives:– Displays, confidence intervals, effect size statistics– Model comparison using information-theoretic
approaches– Bayesian analysis
• Methods of Inference in Biology
What are we trying to do?
• Descriptive or exploratory analyses
• Fitting predictive models
• Challenging research hypotheses
What are we trying to do?
• Descriptive or exploratory analyses
– What factors influence species diversity?
• Fitting predictive models
– Can we make global maps of species diversity?
• Challenging research hypotheses
– Is diversity inversely related to latitude?
The traditional approach:Null Hypothesis Significance Testing• Formulate null hypothesis• Formulate alternative hypothesis• Decide on test statistic• Collect data• What is probability (P) of test statistic, or more extreme value,
under null hypothesis?• If P<α (usually 0.05) conclude:
– Reject null in favour of alternative
• If P>α conclude:– Do not reject null hypothesis
Null Hypothesis Significance TestingAn example
• Formulate null hypothesis– “Species diversity does not change with latitude”
• Formulate alternative hypothesis– “Species diversity decreases with latitude”
• Decide on test statistic– Correlation between diversity measure and latitude, r
• Collect data– 405 measures of diversity at different latitudes
• What is probability (P) of test statistic, or more extreme value, under null hypothesis?– r = -0.1762; P = 0.002 (one-sided)
• If P<α (usually 0.05) conclude:– Reject “Species diversity does not change with latitude”
Criticisms of: Null Hypothesis Significance Testing (1)
• α is arbitrary• Most null hypotheses are false, so why test them?• Statistical significance is not equivalent to biological
significance– with large samples, statistical significance but not biological
significance
– with small samples, biological significance but not statistical significance
• If statistical power is low, the null hypothesis will usually not be rejected when false
• Encourages arbitrary inferences when many tests carried out
Criticisms of: Null Hypothesis Significance Testing (2)
• Power analysis does not save NHST– arbitrary, confounded with P-value– “vacuous intellectual game” (Shaver 1993)
• Incomplete reporting and publishing– only report statistically significant results– only publish statistically significant results
• Focussing on one null and one alternative hypothesis limits scientific advance
• Emphasis on falsification obscures uncertainty about “best” explanation for phenomenon
Misuse of: Null Hypothesis Significance Testing
• Failure to reject null hypothesis does not imply null is true
• Probability of obtaining data given null hypothesis is not probability null hypothesis is true
• Poor support for null hypothesis does not imply alternative hypothesis is true
Practicalimportance ofobserveddifference
Statistical significance
Not significant Significant
Not important Happy Annoyed
Important Very sad Elated
Johnson (1999) “The insignificance of statistical significance testing”J. Wild. Manage.
Johnson (1999) “The insignificance of statistical significance testing”J. Wild. Manage.
Practicalimportance ofobserveddifference
Statistical significance
Not significant Significant
Not important n OK n too large
Important n too small n OK
Null Hypothesis Significance Testing:• “no longer a sound or fruitful basis for statistical
investigation” (Clarke 1963)
• “essential mindlessness in the conduct of research” (Bakan 1966)
• “In practice, of course, tests of significance are not taken seriously” (Guttman 1985)
• “simple P-values are not now used by the best statisticians” (Barnard 1998)
• “The most common and flagrant misuse of statistics... is the testing of hypotheses, especially the vast majority of them known beforehand to be false” (Johnson 1999)
“The problems with Null Hypothesis Significance Testing
are so severe that some have argued for it to be completely
banned from scholarly journals”
Denis (2003) Theory & Science
• Displays, confidence intervals, effect size statistics
• Model comparison using information-theoretic approaches
• Bayesian statistics
Alternatives to:Null Hypothesis Significance
Testing
Diversity and latitude
• r = -0.1762; P = 0.002
• r = -0.1762; 95% c.i.: -0.2690; -0.0801
0 10 20 30 40 50 60 701
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
Latitude
Div
ersi
ty 95%c.i.
Diversity and latitude:Maybe by focussing on the diversity-latitude
hypothesis, we have missed the real story
0 10 20 30 40 50 60 701
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
Latitude
Div
ersi
ty
5 10 15 20 25 30 351
1.5
2
2.5
3
3.5
4
4.5
SST
Div
ersi
ty
Atlantic
Pacific
5 10 15 20 25 30 350.5
1
1.5
2
2.5
3
3.5
4
4.5
5
SST
Div
ersi
ty
Galápagos
Gully
Other
5 10 15 20 25 30 351
1.5
2
2.5
3
3.5
4
SST
Div
ersi
ty
Effect Size Statistics• indicate the association that exists between two or more variables
– Pearson’s r correlation coefficient (or r2)• for two continuous variables
– Cohen’s d• for one continuous, one two-level category (t-test)
– Hedges’ g• better than d when sample sizes are very different
– Cohen’s f2
• for one continuous, one multi-level category (F-test)
– Cramer’s φ• for two categorical variables (Chi2 test)
– Odds ratio• for two binary variables
Cohen’s d
• d = 0.2 indicative of a small effect size
• d = 0.5 a medium effect size
• d = 0.8 a large effect size
d = Difference between means of two groups Pooled standard deviation
Problems with effect size statistics
• No serious problems
• But they don’t tell the whole story
Model fitting:How can we best predict diversity?
0 10 20 30 40 50 60 701
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
Latitude
Div
ersi
ty
5 10 15 20 25 30 351
1.5
2
2.5
3
3.5
4
4.5
SST
Div
ersi
ty
Atlantic
Pacific
5 10 15 20 25 30 350.5
1
1.5
2
2.5
3
3.5
4
4.5
5
SST
Div
ersi
ty
Galápagos
Gully
Other
5 10 15 20 25 30 351
1.5
2
2.5
3
3.5
4
SST
Div
ersi
ty
Some models of diversity
constant
SST
SST, SST2
SST, SST2, SST3
lat
lat, lat2
lat, lat2, lat3
SST, SST2, lat
SST, SST2, lat, lat2
SST, SST2, lat, lat2, lat3
ocean
SST, SST2, ocean
area
SST, SST2, area
SST = Sea Surface Temperature lat = LatitudeOcean = Atlantic /Pacific area = Ocean area (categorical)
Which model is best?Model: Residual sum of squares Parameters
constant 0.854 2
SST 0.774 3
SST, SST2 0.724 4
SST, SST2, SST3 0.726 5
lat 0.835 3
lat, lat2 0.804 4
lat, lat2, lat3 0.785 5
SST, SST2, lat 0.725 5
SST, SST2, lat, lat2 0.722 6
SST, SST2, lat, lat2, lat3 0.724 7
ocean 0.844 3
SST, SST2, ocean 0.725 5
area 0.831 4
SST, SST2, area 0.723 6
Lowest RSSbut many
parameters
Which model is best?
• Information-theoretic AIC– Akaike Information Criterion
• A measure of the similarity between the statistical model and the true distribution
• Trades off the complexity of a model against how well it fits the data
Which model is best? Model: RSS Parameters AIC
constant 0.854 2 -61.08
SST 0.774 3 -99.81
SST, SST2 0.724 4 -125.54
SST, SST2, SST3 0.726 5 -123.64
lat 0.835 3 -69.09
lat, lat2 0.804 4 -83.19
lat, lat2, lat3 0.785 5 -92.19
SST, SST2, lat 0.725 5 -124.10
SST, SST2, lat, lat2 0.722 6 -125.05
SST, SST2, lat, lat2, lat3 0.724 7 -123.05
ocean 0.844 3 -64.88
SST, SST2, ocean 0.725 5 -124.27
area 0.831 4 -69.77
SST, SST2, area 0.723 6 -124.59
Lowest AIC:Best Model
How much support for different models? Model: AIC ΔAIC
constant -61.08 64.46
SST -99.81 25.73
SST, SST2 -125.54 0.00
SST, SST2, SST3 -123.64 1.90
lat -69.09 56.45
lat, lat2 -83.19 42.35
lat, lat2, lat3 -92.19 33.35
SST, SST2, lat -124.10 1.45
SST, SST2, lat, lat2 -125.05 0.49
SST, SST2, lat, lat2, lat3 -123.05 2.49
ocean -64.88 60.66
SST, SST2, ocean -124.27 1.27
area -69.77 55.77
SST, SST2, area -124.59 0.96
How much support for different models? Model: AIC ΔAIC
constant -61.08 64.46 No support
SST -99.81 25.73 No support
SST, SST2 -125.54 0.00 Best model
SST, SST2, SST3 -123.64 1.90 Some support
lat -69.09 56.45 No support
lat, lat2 -83.19 42.35 No support
lat, lat2, lat3 -92.19 33.35 No support
SST, SST2, lat -124.10 1.45 Some support
SST, SST2, lat, lat2 -125.05 0.49 Some support
SST, SST2, lat, lat2, lat3 -123.05 2.49 Little support
ocean -64.88 60.66 No support
SST, SST2, ocean -124.27 1.27 Some support
area -69.77 55.77 No support
SST, SST2, area -124.59 0.96 Some support
Relative importance of variablesfrom AIC
SST 1.000
SST2 1.000
SST3 0.211
lat 0.398
lat2 0.280
lat3 0.075
ocean 0.128
area 0.141
Best model of diversity:Diversity = 0.293 + 0.261SST - 0.00614SST2
5 10 15 20 25 30 351
1.5
2
2.5
3
3.5
4
SST
Div
ersi
ty
Global pattern of diversityapply equation to global SST map
Global pattern of diversityapply equation to SST predictions
from global circulation models
Advantages and criticisms of information-theoretic model-fitting• Indicates “best” model and
support for other models
• Can compare very different models
• Balances complexity of model against fit
• Produces predictive models
• Fairly simple mathematically and computationally
• Model averaging
• Philosophical basis “nuanced”
• Which models to consider is subjective
Bayesian Analysis
• Given prior distribution of models or model parameters
• Collect data
• Work out probability of data for each model and combination of model parameters
• Work out posterior distribution of models or model parameters– using Bayes’ theorem
Bayes’ Theorem
Posterior probability of model given data =
Probability of data given model X Probability of model
Probability of data
Bayesian Analysis
• So, Bayesian analysis gives:
– the probability of models or parameters given prior knowledge and data
– very nice!– but may need considerable computation
Example of Bayesian Analysis
• Trying to work out survival rate of newly studied species of rodent
• Ten other species in genus have mean survival per year of 0.72 (SD 0.13)
• Of 20 animals marked, 17 survive for 1 year
• Standard (binomial) estimate of survival = 0.850 (95% c.i. 0.621 - 0.968)
• Bayesian estimate of survival = 0.797 (95% c.i. 0.637 - 0.921)
Advantages and Difficulties with Bayesian Analysis
• Philosophically very nice• Gives probability of
model given data and prior information
• Updates estimates as more information becomes available
• Does not give biologically implausible estimates– e.g. survival >1
• Fits adaptive management paradigm
• Choice of priors somewhat arbitrary
• Bayesian analysis with “uninformative priors” gives similar results to simpler methods
• Complex• Computation can be
VERY time consuming and opaque
Methods of Inference in Biology• Descriptive or exploratory analyses
– Displays, confidence intervals, effect size statistics– Model comparisons using AIC, etc– Bayesian analysis (if prior information)– Null hypothesis significance tests?
• Fitting predictive models– Model comparisons using AIC, etc– Bayesian analysis (if prior information)
• Challenging research hypotheses– Model comparisons using AIC, etc– Null hypothesis significance tests
This class
• Displays, confidence intervals, effect size statistics ***
• Model comparisons using AIC, etc **• Bayesian analysis• Null hypothesis significance tests *