Upload
luigi
View
28
Download
1
Embed Size (px)
DESCRIPTION
Friday: Lab 3 & A3 due Mon Oct 1: Exam I this room, 12 pm Please, no computers or smartphones Mon Oct 1: No grad seminar Next grad seminar: Wednesday, Oct 10 Type II error & Power. Today. Table 7.1 Generic recipe for decision making with statistics - PowerPoint PPT Presentation
Citation preview
• Friday: Lab 3 & A3 due
• Mon Oct 1: Exam I this room, 12 pm Please, no computers or
smartphones
• Mon Oct 1: No grad seminar
Next grad seminar: Wednesday, Oct 10
Type II error & Power
Today
Table 7.1 Generic recipe for decision making with statistics
1. State population, conditions for taking sample2. State the model or measure of pattern……………………………ST3. State null hypothesis about population……………………………H0 4. State alternative hypothesis……………………………………… HA5. State tolerance for Type I error…………………………………… α6. State frequency distribution that gives probability of outcomes when
the Null Hypothesis is true. Choices:a) Permutations: distributions of all possible outcomesb) Empirical distribution obtained by random sampling of all possible
outcomes when H0 is truec) Cumulative distribution function (cdf) that applies when H0 is true
State assumptions when using a cdf such as Normal, F, t or chisquare7. Calculate the statistic. This is the observed outcome8. Calculate p-value for observed outcome relative to distribution of
outcomes when H0 is true9. If p less than α then reject H0 in favour of HA
If greater than α then not reject H010.Report statistic, p-value, sample size
Declare decision
Table 7.2 Key for choosing a FD of a statistic
Statistic of the population is a meanIf data are normal or cluster around a central value
If sample size is large(n>30)……....…………Normal distribution
If sample size is small(n<30)……....…………t distributionIf data are Poisson………………………………..Poisson distributionIf data are Binomial………………………………Binomial distributionIf data do not cluster around central value, examine residualsIf residuals are normal or cluster around a central value
If residuals are normal or cluster around a central valueIf sample size is large(n>30)……....…………Normal
distributionIf sample size is small(n<30)……....…………t distribution
If residuals are not normal………………………Empirical distribution
Statistic of the population is a varianceIf data are normal or cluster around a central value……...Chi-squareIf data do not cluster around a central value
If sample size is large(n>30)……....… …Chi-square distribution
If sample size is small(n<30)……....…………Empirical
Table 7.2 Key for choosing a FD of a statistic - continued
Statistic of the population: ratio of 2 variances (ANOVA tables)If data are normal or cluster around a central value…………….F distIf data do not cluster around central value, calculate residualsIf residuals are normal or cluster around a central value……….F distIf residuals do not cluster around a central value
If sample size is large(n>30)……....………………F distribution
If sample size is small(n<30)……....………………..…Empirical
Statistic is none of the aboveSearch statistical literature for apropriate
distribution or confer with a statisticianIf not in literature or can not be found…....………………..…Empirical
Example: jackal bones - revisited
Example: jackal bones - revisited
1.
2.
3.
4.
5.
Example: jackal bones - revisited
6. Key
7.
2
21
2121
11psnn
XXt
22
21
21
1ss
n
XXt
2
11
21
222
2112
nn
snsns p
Example: jackal bones - revisited
8. Calculate p from t dist
Example: jackal bones - revisited
9.
10.
Example: jackal bones - revisited
Is your data normal?
Example: jackal bones - revisited
Is your data normal?
It really does not matter!
The assumption is that the residuals follow a normal distribution
Example: jackal bones - revisited
Are your residuals normal?
Residuals
Fre
quen
cy
-5 0 5
01
23
45
Example: roach survival
Data:
Survival (Ts) in days of the roach Blatella vaga when kept without food or water
Females n=10 mean(Ts)=8.5 days var(Ts)=3.6 days
Males n=10 mean(Ts)=4.8 days var(Ts)=0.9 days
Is the variation in survival time equal between male and female roaches?
Data from Sokal & Rohlf 1995, p 189
Example: roach survival
1.
2.
3.
4.
5.
Example: roach survival
6. Key
7.
8.
Example: roach survival
9.
10.
Parameters
Formal models (equations) consist of variable quantities and parameters
Parameters have a fixed value in a particular situation
Parameters are found in
functional expressions of causal relations
statistical or empirical functions
theoretical frequency distributions
Parameters are obtained from data by estimation
Parameters - examples
1. Functional relationship. Scallops density
Mscal=k1 if R=5 or 6
Mscal=k2 if R not equal to 5 or 6
Mscal = kg caught pr unit area of seafloor
R = sediment roughness from 1 (sand) to 100 (cobble)
k = mean scallop catch
Red for params, blue for variables
Parameters - examples
2. Statistical relationship. Morphoedaphic equation
Mfish= 1.38 MEI0.4661
Mfish= kg ha-1 yr-1 fish caught per year from lake
MEI = ppm m-1 dissolved organics/lake depth
0.4661
1.38 kg ha-1 ppm-0.4661 m0.4661
Red for params, blue for variables
Parameters - examples
3. Frequency distribution. Normal distribution
Red for params, blue for variables
2
2
1
2
1)(
X
eYfpdf
XYwhere
Y
X
μ = mean
σ = standard deviation
Parameter estimates
1. Scallops density
Mscal= μ1 if R=5 or 6
Mscal= μ 2 if R not equal to 5 or 6
Theoretical model to calculate μ1 and μ2?
Non-existent
estimate from data recorded in 28 tows
Mscal= μ1=mean(MR=5,6) n=13
Mscal= μ2=mean(MR<>5,6) n=15
Parameter estimates
2. Ryder’s morphoedaphic equation
pM = α MEIβ
ln(pM) = + populationY = a + MEI ln(MEI) sample
XY
YYmean
XXYYnXYCov
XVarXYCov
MEI
MEI
ˆˆ
)(ˆ
))(()1(),(
)(/(),(ˆ
0
1
Statistical Inference
Two categories:
1. Hypothesis testing
Make decisions about an unknown population parameter
2. Estimation
specific values of an unknown population parameter
Parameters
Estimation:
1. Analytic formula
e.g. slope, mean
2. Iterative methods
criterion: maximize the likelihood of the parameter
common ways to measure the likelihood:
sums of squared deviations of data from model
G-statistic (Poisson, binomial)
Parameters
Uncertainty:
Confidence limit:
2 values between which we have a specified level of confidence (e.g. 95%) that the population parameter lies