Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
BMI 541/699: Lecture 11
We have covered:
1. Introduction and Experimental Design
2. Exploratory Data Analysis
3. Probability
4. Distribution of sample statistics
5. Testing hypotheses about the sample mean(s)
- One sample t-test- Two sample t-test (two sided p-value)- relationship between one sample t-test and the
confidence interval for the mean- Confidence interval for the difference of two means- Paired t-test- Using R for t-tests and confidence intervals
1 / 44
Review: Assumptions
The hypothesis tests and confidence intervals that we have learnedare valid if:
1. All samples are simple random samples.
2. The sampling distribution of the sample mean(s) isapproximately normal.
This is true when:
- the distribution of X is not too skew- the sample size is large enough for the central limit theorem to
apply
2 / 44
Large samples
We have discussed large samples in two settings.
• Samples that are large enough for the central limit theorem tohold. The needed sample size depends on the skewness of thepopulation distribution of our variable (I said n ≥ 30 but itreally depends on the data).
• Samples that are large enough so that the normal distributioncan be used for confidence intervals and t-tests. N ≥ 50 issufficient but it’s better to just use the t-distribution for allsamples.
3 / 44
Checking Assumptions for T-based confidence intervalsand tests
1. All samples must be simple random samples.
a. Every individual in the population has an equal chance ofbeing in the sample
b. The fact that one individual is in the sample does not changethe probability that any other individual is in the sample(independence).
2. The distribution of the sample mean(s) must beapproximately normal. We can check this:
a. Create a histogram of the sample values (one for each sampleif doing a 2 sample t-test).
b. Check that it is not too skew.I Small sample sizes require symmetric distributions for the CLT
to hold.I Large sample sizes can have more skew distributions and the
CLT will still hold.
4 / 44
Review: Confidence Interval for the population mean.If
• the above assumptions hold
• the variable X has mean = µ and sd = σ.
then
the 95% confidence interval for µ is:(x̄ − t0.025 , n−1
s√n, x̄ + t0.025 , n−1
s√n
)where t0.025 , n−1 is the number t such that Pr(Tn−1 > t) = 0.025
− t0.025,n−1 0 t0.025,n−1
Tn−1 distribution
area = 0.025area = 0.025
5 / 44
Connection between Hypothesis Testing and ConfidenceIntervals
Suppose that we have n = 8 observations from N(µ, σ2).
We observe x̄ = 37.1 and s = 2.51.
We want to test H0 : µ = 35 vs. HA : µ 6= 35.
If we reject H0 when the P-value is less than 5% then 5% is calledthe α level or significance level or size of the test.
• Test statistic: t = (37.1− 35)/(2.51/√
8) = 2.368.
• P-value = 2× Pr(T7 ≥ 2.368) = 2× .0249 = .0498.
• We reject H0 at the 5% level (but just barely).
We would reject H0 at the 5% level for any test statistic withabsolute value greater than t0.025 , 7 = 2.365.
6 / 44
The Rejection Region for a Hypothesis test at the 5% level is the setof possible values of the test statistics which will result in a p-valueless than or equal to 0.05.
For our example the rejection region is
(−∞,−2.365) plus (2.365,∞)
−2.365 2.365test statistic
Blue area sums to 0.05
Red line is Rejection Region
Values of x̄ which produce a test statistic that is not in therejection region will be in the 95% confidence interval for µ.
7 / 44
In general, We reject the null hypothesis µ = µ0 versus the two-sided alternative at the 0.05 level of significance if and only if a 95%confidence interval for µ does not contain µ0.
In other words:
• if 95% CI for µ contains µ0, then we fail to reject H0 : µ = µ0at the 0.05 level (the p-value will be larger than 0.05).
• if 95% CI for µ does not contain µ0, then we rejectH0 : µ = µ0 at the 0.05 level (the p-value will be smaller than0.05).
In our example the 95% CI for µ is
37.1± 2.365× 2.51/√
8 = (35.001 , 39.120)
• The 95% CI for µ does not include 35 (but just barely).
• The p-value for the two sided t-test was 0.0498. We rejectedthe null hypothesis (but just barely).
8 / 44
Confidence intervals for the difference between means fromtwo independent samples
Recall the previous BMD Example: We wish to determine whethermaternal cigarette smoking has any effect on bone mineral densityin newborns.
• Population 1: Infants of mothers who smoked duringpregnancy.
X1 = BMD for infants in population 1.
• Population 2: Infants of mothers who did not smoke duringpregnancy.
X2 = BMD for infants in population 2.
H0 : µ1 − µ2 = 0 HA : µ1 − µ2 6= 0
n1 = 77 x̄1 = 0.098 s1 = 0.026
n2 = 161 x̄2 = 0.095 s2 = 0.039 / 44
Find a 95% CI for the effect of maternal smoking on BMDassuming equal variances. That is a 95% CI for µ1 − µ2.
We know:
• x̄1 − x̄2 is our best estimate for µ1 − µ2• sp = ( the pooled estimate of σ1 = σ2 = σ)
• sd(X̄1 − X̄2) = sp√
1/n1 + 1/n2
Using the same method we used for a confidence interval for themean of a single population we can find the 95% CI for µ1 − µ2 is:
(x̄1 − x̄2) ± tdf , 0.025 sp√
1/n1 + 1/n2
10 / 44
Last time we calculated
• s2p = 0.0008278
• sp = 0.0288
• sp has 77 + 161− 2 = 236 degrees of freedom.
• The pooled estimate of the standard deviation of X̄1 − X̄2 is
sp√
1/77 + 1/161 = 0.00399
11 / 44
The final thing we need for the CI is t0.025 , 236 where
Pr(T236 > t0.025 , 236) = 0.025
From R or t-tables t0.025 , 236 = 1.97
0 1.97
T236 distribution
area = 0.025
So the 95% CI for µ1 − µ2 (BMD smoking - BMD non smoking) is
(x̄1 − x̄2)± 1.97× sp√
1/n1 + 1/n2
= 0.003± 1.97× 0.00399 = 0.003± 0.008 = (−0.005, 0.011)
( | | )
−0.005 0 0.003 0.011
smoking − non smoking12 / 44
Just as in the one sample case, the Rejection Region is the set ofpossible values of the test statistics which will result in a p-valueless than or equal to 0.05.
−1.97 1.97test statistic
Blue area sums to 0.05
Red line is Rejection Region
• Any test statistic (t-statistic) that is in the rejection regionwill allow us to reject H0.
• Values of x̄ which produce a test statistic that is not in therejection region will be in the 95% confidence interval forµ1 − µ2.
13 / 44
Why report confidence intervals?When we did the 2-sample t-test of differences in BMD we failedto reject H0 = the two means are equal.
This can happen for two reasons
• H0 is true
• We failed to gather enough data to show that H0 is false
The very narrow confidence interval (relative to clinically importantchanges in BMD) tells us that failing to reject the null hypothesisis probably not due to small sample size
( | | )
−0.005 0 0.003 0.011
smoking − non smoking
This is the advantage of reporting CIs as well as P-values.
It is more informative to include estimates and confidenceintervals in presentations and publications as well as P-values.
14 / 44
Comparing Population Means for Paired Data
Example: A study was conducted to investigate whether oat brancereal helps to lower serum cholesterol in males with highcholesterol.
• population = men with high cholesterol who do not eat oatbran in their normal diet.
• A simple random sample of size 14 was obtained.
• It is known that LDL cholesterol follows an approximatelynormal distribution in this population.
For each subject:
day 1 measure LDL cholesterol
next 2 weeks: change breakfast to include oat bran cereal
1st day after diet ends: measure LDL cholesterol
15 / 44
LDL Example continued
DefineX1 = LDL cholesterol before oat bran diet
X2 = LDL cholesterol after oat bran diet
We wish to test: H0 : µ1 − µ2 = 0 versus HA : µ2 − µ2 6= 0
At this point the setup looks like a two-sample t-test.
What are the assumptions for a two-sample t-test?
1. Two independent simple random samples, one from eachpopulation.
2. Approximately normally distributed sample means.
Are these met?
16 / 44
Here is the data:
LDL CholesterolBefore Oat Bran After Oat Bran
Subject X1 X2
1 4.61 3.842 6.42 5.573 5.40 5.854 4.54 4.80...
......
13 2.25 1.8414 4.24 4.14
Are the two means approximately normally distributed?
Are the two samples independent?
17 / 44
The within-person difference is what we are interested in.
The data again:
Subject Before After D = After - Before1 4.61 3.84 -0.772 6.42 5.57 -0.853 5.40 5.85 0.454 4.54 4.80 0.26...
......
13 2.25 1.84 -0.4114 4.24 4.14 -0.10
The D values summarize the interesting information in the data.
If we state our null hypothesis in terms of D we can use a onesample t-test since the D measurements are independent (one fromeach subject).
18 / 44
What is our null hypothesis?
We wish to know if eating oat bran changes cholesterol so we wantto test
H0 : µD = 0 versus HA : µD 6= 0
This has exactly the same form as hypotheses for a 1 sample t-test
H0 : µ = µ0 versus HA : µ 6= µ0
if we set µ0 = 0.
The one sample t-test of the differences is called a paired t-test.
19 / 44
To conduct the test we need D̄ and sD the sample mean andstandard deviation of the differences.
Here are some statistics calculated from the data:
Sample Mean Sample SD n
X1 = LDL before x̄1 = 4.44 s1 = 0.97 n1 = 14
X2 = LDL after x̄2 = 4.08 s2 = 1.06 n2 = 14
D = after - before D̄ = −0.36 sD = 0.41 nD = 14= X2 − X1
Note:D̄ = x̄2 − x̄1 and n1 = n2 = nD
However, s1, s2 and sD have no relationship
We only need the statistics in the third row for our test.
20 / 44
What assumptions are we making?
The assumptions of a one sample t-test applied to the differences.
• The differences are measured on a SRS from the population.
• The distribution of the differences is relatively symmetric.
21 / 44
Recall we are testing:
H0 : µD = 0 versus HA : µD 6= 0
The test statistic is
t =D̄ − 0
sD/√nD
=−0.36− 0
0.41/√
14= −3.29
P-value = 2× Pr(T14−1 > | − 3.29|)
= 2× Pr(T13 > 3.29)
= 2× 0.0030 = 0.0060
We have strong evidence against H0 and for the claim that oatbrand cereal lowers LDL cholesterol.
22 / 44
Paired t-tests summary
Definition: A Sample is made up of sampling units (oftensubjects but can be families, hospitals, etc.)
A paired t-test is used when the measurements are paired becausethe sampling units in the SRS are measured twice. For example:
• SRS of patients - before and after measurements of each.
• SRS of families - two children measured from each.
• SRS of litters - two mice measured from each.
• SRS of rats - two eyes measured from each.
23 / 44
The key is to figure out:
• What are the sampling units: people, families, litters, rats, . . .
• How many measurements are taken on each sampling unit.
2 SRSs, 1 measurement on each sampling unit→ 2 sample t-test.
1 SRS randomized into 2 groups, 1 measurement on eachsampling unit→ 2 sample t-test.
1 SRS, 2 measurements on each sampling unit→ paired t-test.
24 / 44
Confidence interval for µD (the difference between pairedvariables)
Recall:Sample Mean Sample SD n
X1 = LDL before x̄1 = 4.44 s1 = 0.97 n1 = 14
X2 = LDL after x̄2 = 4.08 s2 = 1.06 n2 = 14
D = after - before D̄ = −0.36 sD = 0.41 nD = 14= X2 − X1
Just like the paired hypothesis test, the CI for the mean differencerequires only the information in the third line.
25 / 44
The 95% CI for a paired sample is the same as a one sample95% CI for the mean of the variable D.
D̄ ± t0.025 , df ×sD√nD
df = 14− 1t0.025 , 13 = 2.16
0 t0.025, 13 = 2.16
T13 distribution
area = 0.025
26 / 44
So the 95% CI for µD (after - before)
x̄D ± t 0.025, df sD/√nD = −0.36 ± 2.16× 0.41/
√14
= −0.36 ± 0.24
= (−0.60, − 0.12)
( | ) |−0.6 −0.36 −0.12 0
D
We have strong evidence against the null hypothesis but a fairlywide confidence interval.
If we want a smaller confidence interval we need a larger sample.
27 / 44
Summary: t-tests and confidence intervals for the mean ormean difference
Parameter of interest is the Population MeanPopulation parameter: Population mean = µ
Sample estimate: Sample mean = x̄
SD of estimate: sd(x̄) = sd(x)/√n = s/
√n
Confidence interval: x̄ ± tα/2,n−1sd(x̄)
Hypothesis test: One sample t-test
Hypotheses: H0 : µ = µ0 vs. HA : µ 6= µ0
Test Statistic: t = (x̄ − µ0)/sd(x̄)
P-value (2-sided): 2× Pr(Tn−1 > |t|)Assumptions: SRS & X̄ is approximately normally distributed
28 / 44
Parameter of interest is the Difference between two population means
(independent samples)
Population parameter: Difference in population means = µ1 − µ2
Sample estimate: Difference in sample means = x̄1 − x̄2SD of est., eq. var: sd(x̄1 − x̄2) = sp
√1/n1 + 1/n2
where sp =√
[(n1 − 1)s21 + (n2 − 1)s22 ]/(n1 + n2 − 2) and
df = n1 + n2 − 2
unequal var: sd(x̄1 − x̄2) =√
s21/n1 + s22/n2 where
df = (r1 + r2)2/[r 21 /(n1 − 1)+ r 22 /(n2 − 1)] and where
r1 = s21/n1 and r2 = s22/n2Confidence Interval: x̄1 − x̄2 ± tα/2,dfsd(x̄1 − x̄2)
Hypothesis test: Two-sample t-test for independent samples
Hypotheses: H0 : µ1 = µ2 vs. HA : µ1 − µ2 6= 0
Test Statistic: t = (x̄1 − x̄2)/sd(x̄1 − x̄2)
P-value (2-sided): 2× Pr(Tdf > |t|)Assumptions: independent SRSs from 2 populations
or one SRS randomized to two groups &
X̄1 and X̄2 are approximately normally distributed
29 / 44
Parameter of interest is the Mean of paired differencePopulation parameter: Population mean difference = µD
Sample estimate : sample mean difference = x̄DSD of estimate: sd(x̄D) = sd(xD)/
√n = sD/
√n
Confidence interval: paired t-interval = x̄D ± tα/2,nD−1sd(x̄D)
Hypothesis test: Paired t-test
Hypotheses: H0 : µD = µ0 vs. HA : µD 6= µ0
Test Statistic: t = (x̄D − µ0)/sd(x̄D)
P-value (2-sided): 2× Pr(TnD−1 > |t|)Assumptions: pairs are a SRS &
D̄ is approximately normally distributed
Note: This summary is available on the home page under “Handouts” as
“formula summary”.
30 / 44
R Commander: One sample t-test and confidence intervalfor the population mean
A new data set: fasting.glucose
Glucose blood level (mg/100ml) after a 12 hour fast for a simplerandom sample of 70 women.
We wish to test the null hypothesis that the mean fasting glucosefor women is 75.
H0 : µ = 75 HA : µ 6= 75
31 / 44
First plot the histogram of glucose levels:
Quite symmetric.
32 / 44
To do the 1-sample t-test:
In Rcmdr:
• Statistics → Means → Single-sample t-test
• Set the “Null hypothesis: mu =” to 75
• set the “Confidence Level:” to .95 (default)
• For a two sided test check the Alternative Hypothesis“Population mean != mu0” (default)
33 / 44
The output from R:
> with(fasting.glucose, (t.test(glucose, alternative=’two.sided’,
+ mu=75, conf.level=.95)))
One Sample t-test
data: glucose
t = 2.0335, df = 69, p-value = 0.04585
alternative hypothesis: true mean is not equal to 75
95 percent confidence interval:
75.05654 80.91489
sample estimates:
mean of x
77.98571
Moderate evidence against the null hypothesis.
34 / 44
R Commander: Two sample t-test and confidence intervalfor the difference between two population means
Another new data set: birth.rate
Birth rates (per 1000 residential population) for SRSs of countiesin California and Maine.
23 counties in California and 19 in Maine.
Reference: County and City Data Book 12th edition, U.S. Dept. ofCommerce
We wish to test whether the birth rate is the same in Californiaand Maine.
35 / 44
First plot stacked histograms (use the grouped option)
Reasonably symmetric. The variances do not look equal.
36 / 44
The null hypothesis is
H0 : µC = µM HA : µC 6= µM
To do the 2-sample t-test:
In Rcmdr:
• Statistics → Means → Independent samples t-test
• Choose the “Groups variable” to be state
• Choose the “Response variable” to be births.per.1000
• Click on the Options tab and
- choose “Two-sided” (default)- set the “Confidence Level:” to .95 (default)- “Assume equal variances?” choose “No.” (default)
37 / 44
The output from R:
> t.test(births.per.1000~state, alternative=’two.sided’,
+ conf.level=.95, var.equal=FALSE, data=birth.rate)
Welch Two Sample t-test
data: births.per.1000 by state
t = 4.3467, df = 27.447, p-value = 0.0001708
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.684221 4.691523
sample estimates:
mean in group California mean in group Maine
17.18261 13.99474
Confidence interval is for:
California mean birth rate - Maine mean birth rate
Very strong evidence against the null hypothesis.38 / 44
R Commander: Paired t-test and confidence interval forthe mean difference
Yet another new data set: platelet data set from Shahbaba.
We have measurements on platelet aggregation before and aftersmoking on 11 individuals.
H0 : µA − µB = 0 HA : µA − µB 6= 0
A stands for “After”, B stands for “Before”
We need to plot the data but what should we plot?
What are we assuming?What plot do we need?
39 / 44
To plot the differences we first have to create them
Data → Manage variables in active data set → Compute new variable...New variable name: difference
Expression to compute: After - Before
Then create histogram of difference
Looks reasonably symmetric.40 / 44
To do the paired t-test
In Rcmdr:
• Statistics → Means → Paired t-test
• Choose the “First variable” to be After
• Choose the “Second variable” to be BeforeNote that the differences will be calculated as “First variable”- “Second Variable”
• Click on the Options tab and
- choose “Two-sided” (default)- set the “Confidence Level:” to .95 (default)
41 / 44
The output from R:
> t.test(platelet$After, platelet$Before, alternative=’two.sided’,
+ conf.level=.95, paired=TRUE)
Paired t-test
data: After and Before
t = 4.2716, df = 10, p-value = 0.001633
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
4.91431 15.63114
sample estimates:
mean of the differences
10.27273
Mean and CI are for the differences.Strong evidence against the null hypothesis.
42 / 44
Alternate calculation of paired t-test and confidenceinterval for the mean difference
We can do the paired t-test using the one-sample t-test on thedifferences
In Rcmdr:
• Statistics → Means → Single-sample t-test
• Choose the “Variable” to be difference
• Click on the Options tab and
- Set the “Null hypothesis: mu =” to 0 (default)
- set the “Confidence Level:” to .95 (default)
- For a two sided test check the Alternative Hypothesis“Population mean != mu0” (default)
43 / 44
The output from R:
> with(platelet, (t.test(difference, alternative=’two.sided’,
+ mu=0.0, conf.level=.95)))
One Sample t-test
data: difference
t = 4.2716, df = 10, p-value = 0.001633
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
4.91431 15.63114
sample estimates:
mean of x
10.27273
Same mean, confidence interval and p-value as the previous pairedt-test.
44 / 44