61
STAT E-102 STAT E-102 Midterm Review Midterm Review March 14, 2007 March 14, 2007

STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Embed Size (px)

Citation preview

Page 1: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

STAT E-102 STAT E-102 Midterm ReviewMidterm Review

March 14, 2007March 14, 2007

Page 2: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Review Topics—Class 1Review Topics—Class 1Ch. 1, 2Ch. 1, 2

Populations and samplesPopulations and samples Parameters Parameters (usually unknown)(usually unknown) and statistics and statistics Types of dataTypes of data

i.e. nominal, ordinal, discrete, i.e. nominal, ordinal, discrete, continuouscontinuous

Data Summaries Data Summaries Graphs (bar charts, histograms, box Graphs (bar charts, histograms, box

plots. . .)plots. . .) Frequency TablesFrequency Tables

Page 3: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Review Topics—Class 2Review Topics—Class 2 Ch. 6.1-6.4Ch. 6.1-6.4

ProbabilityProbability Intersection, union, complement, null Intersection, union, complement, null

event, mutually exclusive, independent event, mutually exclusive, independent P(A P(A UU B) = P(A) + P(B) – P(A ∩ B) B) = P(A) + P(B) – P(A ∩ B)

Conditional ProbabilityConditional Probability P(B|A) = P(A ∩ B) / P(A)P(B|A) = P(A ∩ B) / P(A) Sensitivity, specificity, predictive Sensitivity, specificity, predictive

values, p-valuevalues, p-value Bayes’ TheoremBayes’ Theorem

Page 4: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #2#2

The probabilities that a 25- The probabilities that a 25- to 34-year old U.S. male’s to 34-year old U.S. male’s cholesterol level belongs to cholesterol level belongs to one of the following one of the following intervals are listed. intervals are listed.

What is the probability that What is the probability that a male from this population a male from this population has cholesterol <200 has cholesterol <200 mg/dl?mg/dl?

A) 0.414A) 0.414B) 0.567B) 0.567C) 0.847C) 0.847D) 0.280D) 0.280

Cholesterol Cholesterol (mg/dl)(mg/dl)

ProbabilityProbability

80-11980-119 0.0120.012

120-159120-159 0.1410.141

160-199160-199 0.4140.414

200-239200-239 0.2800.280

240-249240-249 0.1080.108

280-319280-319 0.0320.032

320-359320-359 0.0080.008

360-399360-399 0.0050.005

Page 5: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #2#2

Practice Test Question Practice Test Question #2#2

The probabilities that a The probabilities that a 25- to 34-year old U.S. 25- to 34-year old U.S. male’s cholesterol level male’s cholesterol level belongs to one of the belongs to one of the following intervals are following intervals are listed. listed.

What is the probability What is the probability that a male from this that a male from this population has population has cholesterol <200 mg/dl?cholesterol <200 mg/dl?

A) 0.414A) 0.414B) 0.567B) 0.567C) 0.847C) 0.847D) 0.280D) 0.280

The probabilities that a The probabilities that a 25- to 34-year old U.S. 25- to 34-year old U.S. male’s cholesterol level male’s cholesterol level belongs to one of the belongs to one of the following intervals are following intervals are listed. listed.

What is the probability What is the probability that a male from this that a male from this population has population has cholesterol <200 mg/dl?cholesterol <200 mg/dl?

A) 0.414A) 0.414B) 0.567B) 0.567C) 0.847C) 0.847D) 0.280D) 0.280

Cholesterol Cholesterol (mg/dl)(mg/dl)

ProbabilityProbability

80-11980-119 0.0120.012

120-159120-159 0.1410.141

160-199160-199 0.4140.414

200-239200-239 0.2800.280

240-249240-249 0.1080.108

280-319280-319 0.0320.032

320-359320-359 0.0080.008

360-399360-399 0.0050.005

Page 6: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #2#2

These categories are mutually exclusive.These categories are mutually exclusive. P(<200) P(<200)

=P(80-119)+P(120-159)+P(160-199)=P(80-119)+P(120-159)+P(160-199)=0.012+0.141+0.414=0.012+0.141+0.414=0.567=0.567

What is the probability that a male from this What is the probability that a male from this population has cholesterol <200 mg/dl?population has cholesterol <200 mg/dl?

A) 0.414A) 0.414B) 0.567B) 0.567C) 0.847C) 0.847D) 0.280D) 0.280

Page 7: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #3#3

Practice Test Question Practice Test Question #3#3

The probabilities that a 25- to The probabilities that a 25- to 34-year old U.S. male’s 34-year old U.S. male’s cholesterol level belongs to one cholesterol level belongs to one of the following intervals are of the following intervals are listed. listed.

GivenGiven that a person from this that a person from this population has serum population has serum cholesterol level <240 mg/dL, cholesterol level <240 mg/dL, compute the conditional compute the conditional probability that he will have probability that he will have cholesterol <200 mg/dL.cholesterol <200 mg/dL.

A) 0.669A) 0.669B) 0.567B) 0.567C) 0.331C) 0.331D) 0.280D) 0.280

The probabilities that a 25- to The probabilities that a 25- to 34-year old U.S. male’s 34-year old U.S. male’s cholesterol level belongs to one cholesterol level belongs to one of the following intervals are of the following intervals are listed. listed.

GivenGiven that a person from this that a person from this population has serum population has serum cholesterol level <240 mg/dL, cholesterol level <240 mg/dL, compute the conditional compute the conditional probability that he will have probability that he will have cholesterol <200 mg/dL.cholesterol <200 mg/dL.

A) 0.669A) 0.669B) 0.567B) 0.567C) 0.331C) 0.331D) 0.280D) 0.280

Cholesterol Cholesterol (mg/dl)(mg/dl)

ProbabilityProbability

80-11980-119 0.0120.012

120-159120-159 0.1410.141

160-199160-199 0.4140.414

200-239200-239 0.2800.280

240-249240-249 0.1080.108

280-319280-319 0.0320.032

320-359320-359 0.0080.008

360-399360-399 0.0050.005

Page 8: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #3#3

Practice Test Question Practice Test Question #3#3

A=cholesterol <240A=cholesterol <240 B=cholesterol <200B=cholesterol <200 P(B|A) = P(A ∩ B) / P(A)P(B|A) = P(A ∩ B) / P(A) P(A ∩ B) = P(<240 & <200)P(A ∩ B) = P(<240 & <200)

= 0.567 (calc in #3)= 0.567 (calc in #3) P(A) = P(<240)P(A) = P(<240)

= .012+.141+.414+.280= .012+.141+.414+.280= 0.847= 0.847

P(B|A) = 0.567 / 0.847 = 0.669P(B|A) = 0.567 / 0.847 = 0.669

A) 0.669A) 0.669B) 0.567B) 0.567C) 0.331C) 0.331D) 0.280D) 0.280

A=cholesterol <240A=cholesterol <240 B=cholesterol <200B=cholesterol <200 P(B|A) = P(A ∩ B) / P(A)P(B|A) = P(A ∩ B) / P(A) P(A ∩ B) = P(<240 & <200)P(A ∩ B) = P(<240 & <200)

= 0.567 (calc in #3)= 0.567 (calc in #3) P(A) = P(<240)P(A) = P(<240)

= .012+.141+.414+.280= .012+.141+.414+.280= 0.847= 0.847

P(B|A) = 0.567 / 0.847 = 0.669P(B|A) = 0.567 / 0.847 = 0.669

A) 0.669A) 0.669B) 0.567B) 0.567C) 0.331C) 0.331D) 0.280D) 0.280

Cholesterol Cholesterol (mg/dl)(mg/dl)

ProbabilityProbability

80-11980-119 0.0120.012

120-159120-159 0.1410.141

160-199160-199 0.4140.414

200-239200-239 0.2800.280

240-249240-249 0.1080.108

280-319280-319 0.0320.032

320-359320-359 0.0080.008

360-399360-399 0.0050.005

Page 9: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Review Topics—Class 2Review Topics—Class 2 Ch. 6.1-6.4Ch. 6.1-6.4

Evaluating a Screening TestEvaluating a Screening Test Sensitivity = P(T+|D+), Specificity = P(T-|D-)Sensitivity = P(T+|D+), Specificity = P(T-|D-) False Positive Rate = 1 – SpecificityFalse Positive Rate = 1 – Specificity False Negative Rate = 1 - SensitivityFalse Negative Rate = 1 - Sensitivity Positive Predictive Value = Positive Predictive Value = Negative Predictive Value = Negative Predictive Value =

ROC CurvesROC Curves plots of sensitivity vs. specificityplots of sensitivity vs. specificity

DTPDPDTPDP

DTPDPTDP

||

||

DTPDPDTPDP

DTPDPTDP

||

||

Page 10: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #4#4

In the following table, the performance of a diagnostic test is In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. overall population is 5%, please answer the following questions.

What is the sensitivity and specificity of this test?What is the sensitivity and specificity of this test?A) 0.734 and 0.918A) 0.734 and 0.918B) 0.918 and 0.734B) 0.918 and 0.734C) 0.833 and 0.813C) 0.833 and 0.813D) 0.813 and 0.833D) 0.813 and 0.833

+ + DiagnosisDiagnosis

- Diagnosis- Diagnosis TotalTotal

DiseaseDisease 5050 1010 6060

No DiseaseNo Disease 103103 449449 552552

TotalTotal 153153 459459 612612

Page 11: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #4#4

Practice Test Question Practice Test Question #4#4

Sensitivity = P(TSensitivity = P(T++|D|D++)) = 50/60 = 50/60 = 0.833= 0.833

Specificity = P(TSpecificity = P(T--|D|D--)) = 449/552 = 449/552= 0.813= 0.813

Sensitivity = P(TSensitivity = P(T++|D|D++)) = 50/60 = 50/60 = 0.833= 0.833

Specificity = P(TSpecificity = P(T--|D|D--)) = 449/552 = 449/552= 0.813= 0.813

+ + DiagnosisDiagnosis

- Diagnosis- Diagnosis TotalTotal

DiseaseDisease 5050 1010 6060

No DiseaseNo Disease 103103 449449 552552

TotalTotal 153153 459459 612612

Page 12: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #4#4

Practice Test Question Practice Test Question #4#4

In the following table, the performance of a diagnostic test is In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. overall population is 5%, please answer the following questions.

What is the sensitivity and specificity of this test?What is the sensitivity and specificity of this test?A) 0.734 and 0.918A) 0.734 and 0.918B) 0.918 and 0.734B) 0.918 and 0.734C) 0.833 and 0.813C) 0.833 and 0.813D) 0.813 and 0.833D) 0.813 and 0.833

In the following table, the performance of a diagnostic test is In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. overall population is 5%, please answer the following questions.

What is the sensitivity and specificity of this test?What is the sensitivity and specificity of this test?A) 0.734 and 0.918A) 0.734 and 0.918B) 0.918 and 0.734B) 0.918 and 0.734C) 0.833 and 0.813C) 0.833 and 0.813D) 0.813 and 0.833D) 0.813 and 0.833

+ + DiagnosisDiagnosis

- Diagnosis- Diagnosis TotalTotal

DiseaseDisease 5050 1010 6060

No DiseaseNo Disease 103103 449449 552552

TotalTotal 153153 459459 612612

Page 13: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #5#5

Practice Test Question Practice Test Question #5#5

In the following table, the performance of a diagnostic test is In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. overall population is 5%, please answer the following questions.

What is the probability that a subject has the disease What is the probability that a subject has the disease givengiven that that the test was positive?the test was positive?

A) 19%A) 19%B) 81%B) 81%C) 32%C) 32%D) 68%D) 68%

In the following table, the performance of a diagnostic test is In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. overall population is 5%, please answer the following questions.

What is the probability that a subject has the disease What is the probability that a subject has the disease givengiven that that the test was positive?the test was positive?

A) 19%A) 19%B) 81%B) 81%C) 32%C) 32%D) 68%D) 68%

+ + DiagnosisDiagnosis

- Diagnosis- Diagnosis TotalTotal

DiseaseDisease 5050 1010 6060

No DiseaseNo Disease 103103 449449 552552

TotalTotal 153153 459459 612612

Page 14: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #5#5

Practice Test Question Practice Test Question #5#5

Prevalence = 0.05Prevalence = 0.05 Positive Predictive Value = P(D+|T+)Positive Predictive Value = P(D+|T+) Bayes’ Theorem:Bayes’ Theorem:

P(DP(D++|T|T++) = P(D) = P(D++)P(T)P(T++|D|D++) / [P(D) / [P(D++)P(T)P(T++|D|D++) + P(D) + P(D--)P(T)P(T++|D|D--)])]= (Prev)(Sens) / (Prev)(Sens) + (1-Prev)(1-Spec)= (Prev)(Sens) / (Prev)(Sens) + (1-Prev)(1-Spec)= (0.05)(0.833) / (0.05)(0.833) + (0.95)(0.187)= (0.05)(0.833) / (0.05)(0.833) + (0.95)(0.187)= (0.042) / (0.042+0.178)= (0.042) / (0.042+0.178)= 0.191= 0.191

Prevalence = 0.05Prevalence = 0.05 Positive Predictive Value = P(D+|T+)Positive Predictive Value = P(D+|T+) Bayes’ Theorem:Bayes’ Theorem:

P(DP(D++|T|T++) = P(D) = P(D++)P(T)P(T++|D|D++) / [P(D) / [P(D++)P(T)P(T++|D|D++) + P(D) + P(D--)P(T)P(T++|D|D--)])]= (Prev)(Sens) / (Prev)(Sens) + (1-Prev)(1-Spec)= (Prev)(Sens) / (Prev)(Sens) + (1-Prev)(1-Spec)= (0.05)(0.833) / (0.05)(0.833) + (0.95)(0.187)= (0.05)(0.833) / (0.05)(0.833) + (0.95)(0.187)= (0.042) / (0.042+0.178)= (0.042) / (0.042+0.178)= 0.191= 0.191

+ + DiagnosisDiagnosis

- Diagnosis- Diagnosis TotalTotal

DiseaseDisease 5050 1010 6060

No DiseaseNo Disease 103103 449449 552552

TotalTotal 153153 459459 612612

Page 15: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #5#5

Practice Test Question Practice Test Question #5#5

In the following table, the performance of a diagnostic test is In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. overall population is 5%, please answer the following questions.

What is the probability that a subject has the disease What is the probability that a subject has the disease givengiven that that the test was positive?the test was positive?

A) 19%A) 19%B) 81%B) 81%C) 32%C) 32%D) 68%D) 68%

In the following table, the performance of a diagnostic test is In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. overall population is 5%, please answer the following questions.

What is the probability that a subject has the disease What is the probability that a subject has the disease givengiven that that the test was positive?the test was positive?

A) 19%A) 19%B) 81%B) 81%C) 32%C) 32%D) 68%D) 68%

+ + DiagnosisDiagnosis

- Diagnosis- Diagnosis TotalTotal

DiseaseDisease 5050 1010 6060

No DiseaseNo Disease 103103 449449 552552

TotalTotal 153153 459459 612612

Page 16: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Summary StatisticsSummary Statistics Central tendencyCentral tendency: mean , median, mode: mean , median, mode

VariabilityVariability: variance , : variance , standard deviation, range, interquartile rangestandard deviation, range, interquartile range

Variable, random variable, continuous random Variable, random variable, continuous random variablevariable

Probability DistributionProbability Distribution Describes the behavior of a random variableDescribes the behavior of a random variable Normal distribution—N(Normal distribution—N(μμ, , σσ)) Standard Normal Distribution—N(0, 1)Standard Normal Distribution—N(0, 1)

Z-statisticZ-statistic

Review Topics—Class 3Review Topics—Class 3 Ch. 3.1, 3.2, 3.3, 3.5, 7.1, 7.4Ch. 3.1, 3.2, 3.3, 3.5, 7.1, 7.4

n

ii XX

ns

1

22

1

1

X

Z

n

iixn

X1

1

Page 17: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #15#15

A large study has determined that the diastolic A large study has determined that the diastolic blood pressure among women ages 18-74 is blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a normally distributed with mean 70 mm Hg and a standard deviation of standard deviation of σσ=10 mm Hg.=10 mm Hg.

What is the probability that a randomly chosen What is the probability that a randomly chosen woman will have diastolic blood pressure lower woman will have diastolic blood pressure lower than 89.6 mm Hg?than 89.6 mm Hg?

A) 2.5%A) 2.5%B) 5%B) 5%C) 97.5%C) 97.5%D) 95%D) 95%

Page 18: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #15#15

Practice Test Question Practice Test Question #15#15

Z = X - Z = X - µ / µ / σσ= (89.6-70) / 10= (89.6-70) / 10= 1.96= 1.96

P(x<89.6) = P(z<1.96) = 1 – P(z>1.96)P(x<89.6) = P(z<1.96) = 1 – P(z>1.96)= 1 – 0.025= 1 – 0.025= 0.975= 0.975

Or, in STATA: display normprob(1.96)Or, in STATA: display normprob(1.96)= 0.975= 0.975

What is the probability that a randomly chosen woman What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm will have diastolic blood pressure lower than 89.6 mm Hg?Hg?

A) 2.5%A) 2.5%B) 5%B) 5%C) 97.5%C) 97.5%D) 95%D) 95%

Z = X - Z = X - µ / µ / σσ= (89.6-70) / 10= (89.6-70) / 10= 1.96= 1.96

P(x<89.6) = P(z<1.96) = 1 – P(z>1.96)P(x<89.6) = P(z<1.96) = 1 – P(z>1.96)= 1 – 0.025= 1 – 0.025= 0.975= 0.975

Or, in STATA: display normprob(1.96)Or, in STATA: display normprob(1.96)= 0.975= 0.975

What is the probability that a randomly chosen woman What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm will have diastolic blood pressure lower than 89.6 mm Hg?Hg?

A) 2.5%A) 2.5%B) 5%B) 5%C) 97.5%C) 97.5%D) 95%D) 95%

Page 19: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Review Topics—Class 3Review Topics—Class 3 Ch. 8.1-8.3Ch. 8.1-8.3

Central Limit TheoremCentral Limit Theorem Take many samples of size Take many samples of size nn from a population from a population The sample means, , create a sampling The sample means, , create a sampling

distributiondistribution

The standard deviation of the sample means is The standard deviation of the sample means is , which is called the , which is called the standard errorstandard error of the of the meanmean

When When n n is large the sampling distribution is is large the sampling distribution is approximately normal and approximately normal and

X

n

n

XZ

Page 20: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #16#16

Practice Test Question Practice Test Question #16#16

A large study has determined that the diastolic A large study has determined that the diastolic blood pressure among women ages 18-74 is blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a normally distributed with mean 70 mm Hg and a standard deviation of standard deviation of σσ=10 mm Hg.=10 mm Hg.

Suppose that you measure the diastolic blood Suppose that you measure the diastolic blood pressure in n=pressure in n=225 women. What is the sampling 5 women. What is the sampling distribution of Xdistribution of X2525, the mean diastolic blood pressure , the mean diastolic blood pressure of the women in the sample?of the women in the sample?

A) Normal, µA) Normal, µxbarxbar=70 mmHg and =70 mmHg and σσxbarxbar=10 =10 mmHgmmHg

B) Normal, µB) Normal, µxbarxbar=7 mmHg and =7 mmHg and σσxbarxbar=10 mmHg=10 mmHg

C) Normal, µC) Normal, µxbarxbar=70 mmHg and =70 mmHg and σσxbarxbar=2 mmHg=2 mmHg

D) Normal, µD) Normal, µxbarxbar=7 mmHg and =7 mmHg and σσxbarxbar=2 mmHg=2 mmHg

A large study has determined that the diastolic A large study has determined that the diastolic blood pressure among women ages 18-74 is blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a normally distributed with mean 70 mm Hg and a standard deviation of standard deviation of σσ=10 mm Hg.=10 mm Hg.

Suppose that you measure the diastolic blood Suppose that you measure the diastolic blood pressure in n=pressure in n=225 women. What is the sampling 5 women. What is the sampling distribution of Xdistribution of X2525, the mean diastolic blood pressure , the mean diastolic blood pressure of the women in the sample?of the women in the sample?

A) Normal, µA) Normal, µxbarxbar=70 mmHg and =70 mmHg and σσxbarxbar=10 =10 mmHgmmHg

B) Normal, µB) Normal, µxbarxbar=7 mmHg and =7 mmHg and σσxbarxbar=10 mmHg=10 mmHg

C) Normal, µC) Normal, µxbarxbar=70 mmHg and =70 mmHg and σσxbarxbar=2 mmHg=2 mmHg

D) Normal, µD) Normal, µxbarxbar=7 mmHg and =7 mmHg and σσxbarxbar=2 mmHg=2 mmHg

Page 21: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #16#16

Practice Test Question Practice Test Question #16#16

If the underlying distribution is normal, the If the underlying distribution is normal, the sampling distribution is assumed to be normal.sampling distribution is assumed to be normal.

µµxbar xbar = µ= µ00 = 70 mmHg = 70 mmHg σσxbarxbar = = σσ00 / √n = 10 / √25 = 2 mmHg / √n = 10 / √25 = 2 mmHg

Suppose that you measure the diastolic blood Suppose that you measure the diastolic blood pressure in n=pressure in n=225 women. What is the sampling 5 women. What is the sampling distribution of Xdistribution of X2525, the mean diastolic blood pressure , the mean diastolic blood pressure of the women in the sample?of the women in the sample?

A) Normal, µA) Normal, µxbarxbar=70 mmHg and =70 mmHg and σσxbarxbar=10 mmHg=10 mmHg

B) Normal, µB) Normal, µxbarxbar=7 mmHg and =7 mmHg and σσxbarxbar=10 mmHg=10 mmHg

C) Normal, µC) Normal, µxbarxbar=70 mmHg and =70 mmHg and σσxbarxbar=2 mmHg=2 mmHg

D) Normal, µD) Normal, µxbarxbar=7 mmHg and =7 mmHg and σσxbarxbar=2 mmHg=2 mmHg

If the underlying distribution is normal, the If the underlying distribution is normal, the sampling distribution is assumed to be normal.sampling distribution is assumed to be normal.

µµxbar xbar = µ= µ00 = 70 mmHg = 70 mmHg σσxbarxbar = = σσ00 / √n = 10 / √25 = 2 mmHg / √n = 10 / √25 = 2 mmHg

Suppose that you measure the diastolic blood Suppose that you measure the diastolic blood pressure in n=pressure in n=225 women. What is the sampling 5 women. What is the sampling distribution of Xdistribution of X2525, the mean diastolic blood pressure , the mean diastolic blood pressure of the women in the sample?of the women in the sample?

A) Normal, µA) Normal, µxbarxbar=70 mmHg and =70 mmHg and σσxbarxbar=10 mmHg=10 mmHg

B) Normal, µB) Normal, µxbarxbar=7 mmHg and =7 mmHg and σσxbarxbar=10 mmHg=10 mmHg

C) Normal, µC) Normal, µxbarxbar=70 mmHg and =70 mmHg and σσxbarxbar=2 mmHg=2 mmHg

D) Normal, µD) Normal, µxbarxbar=7 mmHg and =7 mmHg and σσxbarxbar=2 mmHg=2 mmHg

Page 22: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #17#17

Practice Test Question Practice Test Question #17#17

A large study has determined that the diastolic A large study has determined that the diastolic blood pressure among women ages 18-74 is blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and normally distributed with mean 70 mm Hg and a standard deviation of a standard deviation of σσ=10 mm Hg.=10 mm Hg.

What is the lower 5What is the lower 5thth percentile of the above percentile of the above distribution (i.e., the point below which, the distribution (i.e., the point below which, the mean diastolic blood pressures of these 25-mean diastolic blood pressures of these 25-women samples will be 5% of the time)?women samples will be 5% of the time)?

A) 66.08 mmHgA) 66.08 mmHgB) 66.71 mmHgB) 66.71 mmHgC) 50.40 mmHgC) 50.40 mmHgD) 27.55 mmHgD) 27.55 mmHg

A large study has determined that the diastolic A large study has determined that the diastolic blood pressure among women ages 18-74 is blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and normally distributed with mean 70 mm Hg and a standard deviation of a standard deviation of σσ=10 mm Hg.=10 mm Hg.

What is the lower 5What is the lower 5thth percentile of the above percentile of the above distribution (i.e., the point below which, the distribution (i.e., the point below which, the mean diastolic blood pressures of these 25-mean diastolic blood pressures of these 25-women samples will be 5% of the time)?women samples will be 5% of the time)?

A) 66.08 mmHgA) 66.08 mmHgB) 66.71 mmHgB) 66.71 mmHgC) 50.40 mmHgC) 50.40 mmHgD) 27.55 mmHgD) 27.55 mmHg

Page 23: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #17#17

Practice Test Question Practice Test Question #17#17

From Z table, we know that Z=-1.645 cuts off the lower 5From Z table, we know that Z=-1.645 cuts off the lower 5 thth percentile. Alternatively, can get this in STATA: display percentile. Alternatively, can get this in STATA: display invnorm(.05)invnorm(.05)

Rearrange Z equation and solve for XRearrange Z equation and solve for Xbarbar::

Z = (XZ = (Xbarbar - µ - µxbarxbar) / ) / σσxbarxbar

XXbarbar = (Z)( = (Z)(σσxbarxbar) + µ) + µxbarxbar

= (-1.645)(2) + 70= (-1.645)(2) + 70= 66.71= 66.71

What is the lower 5What is the lower 5thth percentile of the above distribution (i.e., the percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these point below which, the mean diastolic blood pressures of these 25-women samples will be 5% of the time)?25-women samples will be 5% of the time)?

A) 66.08 mmHgA) 66.08 mmHgB) 66.71 mmHgB) 66.71 mmHgC) 50.40 mmHgC) 50.40 mmHgD) 27.55 mmHgD) 27.55 mmHg

From Z table, we know that Z=-1.645 cuts off the lower 5From Z table, we know that Z=-1.645 cuts off the lower 5 thth percentile. Alternatively, can get this in STATA: display percentile. Alternatively, can get this in STATA: display invnorm(.05)invnorm(.05)

Rearrange Z equation and solve for XRearrange Z equation and solve for Xbarbar::

Z = (XZ = (Xbarbar - µ - µxbarxbar) / ) / σσxbarxbar

XXbarbar = (Z)( = (Z)(σσxbarxbar) + µ) + µxbarxbar

= (-1.645)(2) + 70= (-1.645)(2) + 70= 66.71= 66.71

What is the lower 5What is the lower 5thth percentile of the above distribution (i.e., the percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these point below which, the mean diastolic blood pressures of these 25-women samples will be 5% of the time)?25-women samples will be 5% of the time)?

A) 66.08 mmHgA) 66.08 mmHgB) 66.71 mmHgB) 66.71 mmHgC) 50.40 mmHgC) 50.40 mmHgD) 27.55 mmHgD) 27.55 mmHg

Page 24: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Review Topics—Class 4Review Topics—Class 4 Ch. 10.1-10.3Ch. 10.1-10.3

Statistical InferenceStatistical Inference Hypothesis TestingHypothesis Testing

Null HypothesisNull Hypothesis : No effect or no difference: No effect or no difference

Alternative HypothesisAlternative Hypothesis : An effect or difference : An effect or difference what you are trying to provewhat you are trying to prove

Court Trial ExampleCourt Trial Example P-valuesP-values

Given that is true, the probability that of Given that is true, the probability that of observing a result as or more extreme than observing a result as or more extreme than the one observedthe one observed

0H

AH

0H

Page 25: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #6#6

A small p-value for a hypothesis test (i.e., <0.05) signifies A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following?which of the following?

A) The conditional probability that the null A) The conditional probability that the null hypothesis hypothesis

is true is <0.05is true is <0.05B) The probability that the alternative B) The probability that the alternative hypothesis hypothesis

is is true is >0.95true is >0.95C) The conditional probability of the data being this C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05extreme if the null hypothesis was true is <0.05D) The conditional probability of the data being this D) The conditional probability of the data being this extreme if the alternative hypothesis is true is extreme if the alternative hypothesis is true is

>0.05>0.05

Page 26: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #6#6

Practice Test Question Practice Test Question #6#6

A small p-value for a hypothesis test (i.e., A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following?<0.05) signifies which of the following?

A) The conditional probability that the A) The conditional probability that the null hypothesis is true is <0.05null hypothesis is true is <0.05

B) The probability that the alternative B) The probability that the alternative hypothesis is true is >0.95hypothesis is true is >0.95C) The conditional probability of the C) The conditional probability of the

data data being this extreme if the null hypothesis being this extreme if the null hypothesis was true is <0.05was true is <0.05D) The conditional probability of the D) The conditional probability of the

data data being this extreme if the alternative being this extreme if the alternative hypothesis is true is >0.05hypothesis is true is >0.05

A small p-value for a hypothesis test (i.e., A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following?<0.05) signifies which of the following?

A) The conditional probability that the A) The conditional probability that the null hypothesis is true is <0.05null hypothesis is true is <0.05

B) The probability that the alternative B) The probability that the alternative hypothesis is true is >0.95hypothesis is true is >0.95C) The conditional probability of the C) The conditional probability of the

data data being this extreme if the null hypothesis being this extreme if the null hypothesis was true is <0.05was true is <0.05D) The conditional probability of the D) The conditional probability of the

data data being this extreme if the alternative being this extreme if the alternative hypothesis is true is >0.05hypothesis is true is >0.05

Page 27: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Review Topics—Class 4Review Topics—Class 4 Ch. 10.4, 10.5Ch. 10.4, 10.5

Type I Error (Type I Error (αα)) P(rejecting Ho|Ho is true)P(rejecting Ho|Ho is true) Probability of rejecting when we should notProbability of rejecting when we should not

Type II Error (Type II Error (ββ)) Power = 1- Power = 1- ββ

P(rejecting Ho| is true)P(rejecting Ho| is true) Probability of rejecting when we shouldProbability of rejecting when we should

Hypothesis Testing StepsHypothesis Testing Steps 1) State Ho1) State Ho 2)State 3)Determine 2)State 3)Determine αα 4)Determine the test statistic and associated p-4)Determine the test statistic and associated p-

valuevalue 5)Determine whether to reject or fail to reject Ho5)Determine whether to reject or fail to reject Ho

AH

AH

Page 28: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #18#18

Consider the following: a hypothesis test Consider the following: a hypothesis test was performed. The resulting p-value was was performed. The resulting p-value was higher than the pre-specified alpha and higher than the pre-specified alpha and thus there was a failure to reject the null thus there was a failure to reject the null hypothesis. Which of the following are hypothesis. Which of the following are correct?correct?

A)A) The null hypothesis must be trueThe null hypothesis must be trueB) B) An evaluation of power and type II error An evaluation of power and type II error

may be warrantedmay be warrantedC) It is still feasible that the alternative C) It is still feasible that the alternative

hypothesis is truehypothesis is trueD) Both B and CD) Both B and C

Page 29: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #18#18

Practice Test Question Practice Test Question #18#18

We haven’t proven the null by failing to reject it. It is We haven’t proven the null by failing to reject it. It is still possible that the alternative is actually true.still possible that the alternative is actually true.

Recall that power (1-Recall that power (1-ββ)) is the probability of rejecting is the probability of rejecting the null when it is false. the null when it is false. ββ is is the probability of making a the probability of making a Type II error (failing to reject the null when it is false). Type II error (failing to reject the null when it is false). In this case it is possible that statistical power was too In this case it is possible that statistical power was too low, leading to a Type II error where we have failed to low, leading to a Type II error where we have failed to reject Hreject H00 when it is actually false. An evaluation is when it is actually false. An evaluation is appropriate.appropriate.

A) The null hypothesis must be trueA) The null hypothesis must be trueB) An evaluation of power and type II error may B) An evaluation of power and type II error may

be warrantedbe warrantedC) It is still feasible that the alternative C) It is still feasible that the alternative

hypothesis is hypothesis is truetrueD) Both B and CD) Both B and C

We haven’t proven the null by failing to reject it. It is We haven’t proven the null by failing to reject it. It is still possible that the alternative is actually true.still possible that the alternative is actually true.

Recall that power (1-Recall that power (1-ββ)) is the probability of rejecting is the probability of rejecting the null when it is false. the null when it is false. ββ is is the probability of making a the probability of making a Type II error (failing to reject the null when it is false). Type II error (failing to reject the null when it is false). In this case it is possible that statistical power was too In this case it is possible that statistical power was too low, leading to a Type II error where we have failed to low, leading to a Type II error where we have failed to reject Hreject H00 when it is actually false. An evaluation is when it is actually false. An evaluation is appropriate.appropriate.

A) The null hypothesis must be trueA) The null hypothesis must be trueB) An evaluation of power and type II error may B) An evaluation of power and type II error may

be warrantedbe warrantedC) It is still feasible that the alternative C) It is still feasible that the alternative

hypothesis is hypothesis is truetrueD) Both B and CD) Both B and C

Page 30: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Review Topics—Class 5Review Topics—Class 5 Ch. 9.1, 9.2Ch. 9.1, 9.2

Confidence IntervalsConfidence Intervals Range of values associated with a parameter Range of values associated with a parameter Calculated using the sample dataCalculated using the sample data Will cover the true parameter a specified probability of the Will cover the true parameter a specified probability of the

time (ex. 95%)time (ex. 95%) Understand what affects confidence interval widthUnderstand what affects confidence interval width

Sample size, higher confidence (i.e. 99%), larger sample Sample size, higher confidence (i.e. 99%), larger sample variabilityvariability

1-sided1-sided (-∞, upper bound) used when (-∞, upper bound) used when (lower bound, ∞) used when (lower bound, ∞) used when

2-sided2-sided—(lower bound, upper bound) used when—(lower bound, upper bound) used when

0:,0:0 AHH

0:,0:0 AHH

0:,0:0 AHH

Page 31: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Review Topics—Class 5Review Topics—Class 5 Ch. 9.1, 9.2, 9.3, 9.4Ch. 9.1, 9.2, 9.3, 9.4

Confidence Intervals (cont.)Confidence Intervals (cont.) Normal Distribution—Normal Distribution—

Student’s t Distribution—Student’s t Distribution—

Student’s T-testStudent’s T-test Used when we believe we have a normal distribution but Used when we believe we have a normal distribution but

do not know do not know σσ

We use s to estimate We use s to estimate σσ,,

The t-statistic is calculated by , and has n-1 The t-statistic is calculated by , and has n-1 degrees of freedomdegrees of freedom

n

ii XX

ns

1

22

1

1

ns

Xt

n

zXn

zX

2/2/ ,

n

stX

n

stX 2/2/ ,

Page 32: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #11#11

Most water treatment facilities monitor the quality of their Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16.samples is a mean of 8.42 and a standard deviation of 0.16.

Interest focuses on whether there is sufficient evidence to Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. conclude that the mean pH level in the water differs from 8.5. The hypotheses are then:The hypotheses are then:

A) HA) H00: : µ = 8.5, Hµ = 8.5, HAA: µ ≠ 8.5: µ ≠ 8.5

B) B) HH00: : µ ≠ 8.5, Hµ ≠ 8.5, HAA: µ = 8.5: µ = 8.5

C) C) HH00: : µ = 0, Hµ = 0, HAA: µ ≠ 0: µ ≠ 0

D) D) HH00: : µ = 0, Hµ = 0, HAA: µ ≠ 8.5: µ ≠ 8.5

Page 33: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #11#11

Practice Test Question Practice Test Question #11#11

Most water treatment facilities monitor the quality of their Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16.mean of 8.42 and a standard deviation of 0.16.

Interest focuses on whether there is sufficient evidence to Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from conclude that the mean pH level in the water differs from 8.5. The hypotheses are then:8.5. The hypotheses are then:

A) HA) H00: : µ = 8.5, Hµ = 8.5, HAA: µ ≠ 8.5: µ ≠ 8.5

B) B) HH00: : µ ≠ 8.5, Hµ ≠ 8.5, HAA: µ = 8.5: µ = 8.5

C) C) HH00: : µ = 0, Hµ = 0, HAA: µ ≠ 0: µ ≠ 0

D) D) HH00: : µ = 0, Hµ = 0, HAA: µ ≠ 8.5: µ ≠ 8.5

Most water treatment facilities monitor the quality of their Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16.mean of 8.42 and a standard deviation of 0.16.

Interest focuses on whether there is sufficient evidence to Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from conclude that the mean pH level in the water differs from 8.5. The hypotheses are then:8.5. The hypotheses are then:

A) HA) H00: : µ = 8.5, Hµ = 8.5, HAA: µ ≠ 8.5: µ ≠ 8.5

B) B) HH00: : µ ≠ 8.5, Hµ ≠ 8.5, HAA: µ = 8.5: µ = 8.5

C) C) HH00: : µ = 0, Hµ = 0, HAA: µ ≠ 0: µ ≠ 0

D) D) HH00: : µ = 0, Hµ = 0, HAA: µ ≠ 8.5: µ ≠ 8.5

Page 34: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #12#12

Practice Test Question Practice Test Question #12#12

Most water treatment facilities monitor the quality of Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard N=17 samples is a mean of 8.42 and a standard deviation of 0.16.deviation of 0.16.

The degrees of freedom for this test is:The degrees of freedom for this test is:

A) 7A) 7B) B) 1616C) C) 8.58.5D) D) Not applicableNot applicable

Most water treatment facilities monitor the quality of Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard N=17 samples is a mean of 8.42 and a standard deviation of 0.16.deviation of 0.16.

The degrees of freedom for this test is:The degrees of freedom for this test is:

A) 7A) 7B) B) 1616C) C) 8.58.5D) D) Not applicableNot applicable

Page 35: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #12#12

Practice Test Question Practice Test Question #12#12

This is a one-sample t-test.This is a one-sample t-test. *NOTE* Do not be confused by the use of the word *NOTE* Do not be confused by the use of the word

“samples” in the set-up…it refers to the number of “samples” in the set-up…it refers to the number of water samples.water samples.

df = n – 1df = n – 1= 17 – 1= 17 – 1= 16= 16

The degrees of freedom for this test is:The degrees of freedom for this test is:

A) 7A) 7B) B) 1616C) C) 8.58.5D) D) Not applicableNot applicable

This is a one-sample t-test.This is a one-sample t-test. *NOTE* Do not be confused by the use of the word *NOTE* Do not be confused by the use of the word

“samples” in the set-up…it refers to the number of “samples” in the set-up…it refers to the number of water samples.water samples.

df = n – 1df = n – 1= 17 – 1= 17 – 1= 16= 16

The degrees of freedom for this test is:The degrees of freedom for this test is:

A) 7A) 7B) B) 1616C) C) 8.58.5D) D) Not applicableNot applicable

Page 36: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #13#13

Practice Test Question Practice Test Question #13#13

Most water treatment facilities monitor the quality of their Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16.samples is a mean of 8.42 and a standard deviation of 0.16.

Is there sufficient evidence to conclude that the mean pH level Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance?in the water differs from 8.5 at the 0.05 level of significance?

A) Yes, because the sample mean was 8.42A) Yes, because the sample mean was 8.42B) B) Yes, since p<0.05Yes, since p<0.05C) C) No, since p>0.05No, since p>0.05D) D) No, since the sample mean (8.42) is practically No, since the sample mean (8.42) is practically

equal to 8.5equal to 8.5

Most water treatment facilities monitor the quality of their Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16.samples is a mean of 8.42 and a standard deviation of 0.16.

Is there sufficient evidence to conclude that the mean pH level Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance?in the water differs from 8.5 at the 0.05 level of significance?

A) Yes, because the sample mean was 8.42A) Yes, because the sample mean was 8.42B) B) Yes, since p<0.05Yes, since p<0.05C) C) No, since p>0.05No, since p>0.05D) D) No, since the sample mean (8.42) is practically No, since the sample mean (8.42) is practically

equal to 8.5equal to 8.5

Page 37: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #13#13

Practice Test Question Practice Test Question #13#13

Carry out the one-sample, two-sided t-test at alpha=0.05:Carry out the one-sample, two-sided t-test at alpha=0.05: _ _ t = (x - t = (x - µ)µ) / (s / / (s / √n)√n)

= (8.42-8.5) / (0.16/√17)= (8.42-8.5) / (0.16/√17)= -2.062; 16 df; 2(.025)<p<2(.05)= -2.062; 16 df; 2(.025)<p<2(.05)

p>0.05; do not reject Hp>0.05; do not reject H00

In STATA: ttesti 17 8.42 0.16 8.5In STATA: ttesti 17 8.42 0.16 8.5 p=0.0559; do not reject Hp=0.0559; do not reject H00

Is there sufficient evidence to conclude that the mean pH Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of level in the water differs from 8.5 at the 0.05 level of significance?significance?

A) Yes, because the sample mean was 8.42A) Yes, because the sample mean was 8.42B) B) Yes, since p<0.05Yes, since p<0.05C) C) No, since p>0.05No, since p>0.05D) D) No, since the sample mean (8.42) is practically No, since the sample mean (8.42) is practically

equal to 8.5equal to 8.5

Carry out the one-sample, two-sided t-test at alpha=0.05:Carry out the one-sample, two-sided t-test at alpha=0.05: _ _ t = (x - t = (x - µ)µ) / (s / / (s / √n)√n)

= (8.42-8.5) / (0.16/√17)= (8.42-8.5) / (0.16/√17)= -2.062; 16 df; 2(.025)<p<2(.05)= -2.062; 16 df; 2(.025)<p<2(.05)

p>0.05; do not reject Hp>0.05; do not reject H00

In STATA: ttesti 17 8.42 0.16 8.5In STATA: ttesti 17 8.42 0.16 8.5 p=0.0559; do not reject Hp=0.0559; do not reject H00

Is there sufficient evidence to conclude that the mean pH Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of level in the water differs from 8.5 at the 0.05 level of significance?significance?

A) Yes, because the sample mean was 8.42A) Yes, because the sample mean was 8.42B) B) Yes, since p<0.05Yes, since p<0.05C) C) No, since p>0.05No, since p>0.05D) D) No, since the sample mean (8.42) is practically No, since the sample mean (8.42) is practically

equal to 8.5equal to 8.5

Page 38: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #19#19

The mean cholesterol level of healthy 20-74 year old males in the The mean cholesterol level of healthy 20-74 year old males in the US is 211 mg/mL, with an unknown standard deviation. The US is 211 mg/mL, with an unknown standard deviation. The average cholesterol level among 16 average cholesterol level among 16 hypertensivehypertensive males was 232 males was 232 mg/mL. Conduct a statistical test at the 10% level about whether mg/mL. Conduct a statistical test at the 10% level about whether hypertension is associated with hypertension is associated with higherhigher cholesterol levels (consult cholesterol levels (consult the STATA output below). the STATA output below).

Assuming that the Assuming that the αα-level of the test is 10%, what can you -level of the test is 10%, what can you conclude?conclude?

Number of obs = 16 --------------------------------------------------------------------------------------------------- Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] ------------+-------------------------------------------------------------------------------------- x | 232 12 19.3333 0.0000 206.4226 257.5774 ---------------------------------------------------------------------------------------------------- Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean < 211 Ha: mean ~= 211 Ha: mean > 211 t = 1.7500 t = 1.7500 t = 1.7500 P < t = 0.9497 P > |t| = 0.1005

P > t = 0.0503

Page 39: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #19#19

Practice Test Question Practice Test Question #19#19

This is a one-sample, one-sided t-test.This is a one-sample, one-sided t-test. Set up the hypotheses and select relevant output:Set up the hypotheses and select relevant output:

HH00: : µ ≤ 211µ ≤ 211 HHAA: µ > 211: µ > 211

The alpha level is 0.10. The p-value is 0.0503. p<.10; The alpha level is 0.10. The p-value is 0.0503. p<.10; reject Hreject H00; conclude that hypertensive males have ; conclude that hypertensive males have higher cholesterol levels than the healthy population of higher cholesterol levels than the healthy population of US males 20-74.US males 20-74.

This is a one-sample, one-sided t-test.This is a one-sample, one-sided t-test. Set up the hypotheses and select relevant output:Set up the hypotheses and select relevant output:

HH00: : µ ≤ 211µ ≤ 211 HHAA: µ > 211: µ > 211

The alpha level is 0.10. The p-value is 0.0503. p<.10; The alpha level is 0.10. The p-value is 0.0503. p<.10; reject Hreject H00; conclude that hypertensive males have ; conclude that hypertensive males have higher cholesterol levels than the healthy population of higher cholesterol levels than the healthy population of US males 20-74.US males 20-74.

Number of obs = 16 --------------------------------------------------------------------------------------------------- Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] ------------+-------------------------------------------------------------------------------------- x | 232 12 19.3333 0.0000 206.4226 257.5774 ---------------------------------------------------------------------------------------------------- Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean < 211 Ha: mean ~= 211 Ha: mean > 211 t = 1.7500 t = 1.7500 t = 1.7500 P < t = 0.9497 P > |t| = 0.1005

P > t = 0.0503

Page 40: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #19#19

Practice Test Question Practice Test Question #19#19

Assuming that the Assuming that the αα-level of the test is 10%, what can you conclude?-level of the test is 10%, what can you conclude?A) The p-value of the test is 0.1005 and thus we fail to reject the null A) The p-value of the test is 0.1005 and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a higher cholesterol hypothesis and thus conclude that hypertensives have a higher cholesterol level.level.B) The p-value of the test is 0.9497 and thus we fail to reject the null B) The p-value of the test is 0.9497 and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a lower cholesterol hypothesis and thus conclude that hypertensives have a lower cholesterol level.level.C) The p-value of the test is 0.0503 and thus we fail to reject the null C) The p-value of the test is 0.0503 and thus we fail to reject the null hypothesis and conclude that hypertension is not associated with higher hypothesis and conclude that hypertension is not associated with higher cholesterol levels.cholesterol levels.D) The p-value of the test is 0.0503 and thus we reject the null hypothesis D) The p-value of the test is 0.0503 and thus we reject the null hypothesis and conclude that the mean cholesterol level for hypertensives is higher than and conclude that the mean cholesterol level for hypertensives is higher than healthy 20-74 year old US males.healthy 20-74 year old US males.

Assuming that the Assuming that the αα-level of the test is 10%, what can you conclude?-level of the test is 10%, what can you conclude?A) The p-value of the test is 0.1005 and thus we fail to reject the null A) The p-value of the test is 0.1005 and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a higher cholesterol hypothesis and thus conclude that hypertensives have a higher cholesterol level.level.B) The p-value of the test is 0.9497 and thus we fail to reject the null B) The p-value of the test is 0.9497 and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a lower cholesterol hypothesis and thus conclude that hypertensives have a lower cholesterol level.level.C) The p-value of the test is 0.0503 and thus we fail to reject the null C) The p-value of the test is 0.0503 and thus we fail to reject the null hypothesis and conclude that hypertension is not associated with higher hypothesis and conclude that hypertension is not associated with higher cholesterol levels.cholesterol levels.D) The p-value of the test is 0.0503 and thus we reject the null hypothesis D) The p-value of the test is 0.0503 and thus we reject the null hypothesis and conclude that the mean cholesterol level for hypertensives is higher than and conclude that the mean cholesterol level for hypertensives is higher than healthy 20-74 year old US males.healthy 20-74 year old US males.

Number of obs = 16 --------------------------------------------------------------------------------------------------- Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] ------------+-------------------------------------------------------------------------------------- x | 232 12 19.3333 0.0000 206.4226 257.5774 ---------------------------------------------------------------------------------------------------- Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean < 211 Ha: mean ~= 211 Ha: mean > 211 t = 1.7500 t = 1.7500 t = 1.7500 P < t = 0.9497 P > |t| = 0.1005

P > t = 0.0503

Page 41: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Review Topics—Class 6Review Topics—Class 6 Ch. 11.2-11.3Ch. 11.2-11.3

Two Sample Tests—Independent SamplesTwo Sample Tests—Independent Samples Necessary assumptions for T-testNecessary assumptions for T-test

The two samples are independentThe two samples are independent Each sample is approximately normally distributedEach sample is approximately normally distributed The variances of the two populations are not significantly The variances of the two populations are not significantly

differentdifferent If the assumptions hold then can use If the assumptions hold then can use

with degrees of freedomwith degrees of freedom Where Where

21

2121

11nn

s

xxt

p

221 nn

22

21

221

21

12

2

1

2

1s

nn

ns

nn

nsp

)0(:,)0(: 212121210 torewriteHtorewriteH A

Page 42: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #1#1

T-tests are most useful for what type of T-tests are most useful for what type of data (variables)?data (variables)?

A) ContinuousA) Continuous

B) OrdinalB) Ordinal

C) NominalC) Nominal

D) BinaryD) Binary

Page 43: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #1#1

Practice Test Question Practice Test Question #1#1

T-tests are used for continuous data.T-tests are used for continuous data. Recall:Recall:

Ordinal data have natural order without defined Ordinal data have natural order without defined magnitude (ex. low, moderate, high)magnitude (ex. low, moderate, high)

Nominal data have categories with no order or rank Nominal data have categories with no order or rank (ex. gender, race)(ex. gender, race)

Binary data are discrete data with only two possible Binary data are discrete data with only two possible outcomes (ex. cancer vs. no cancer)outcomes (ex. cancer vs. no cancer)

A) ContinuousA) Continuous

B) OrdinalB) Ordinal

C) NominalC) Nominal

D) BinaryD) Binary

T-tests are used for continuous data.T-tests are used for continuous data. Recall:Recall:

Ordinal data have natural order without defined Ordinal data have natural order without defined magnitude (ex. low, moderate, high)magnitude (ex. low, moderate, high)

Nominal data have categories with no order or rank Nominal data have categories with no order or rank (ex. gender, race)(ex. gender, race)

Binary data are discrete data with only two possible Binary data are discrete data with only two possible outcomes (ex. cancer vs. no cancer)outcomes (ex. cancer vs. no cancer)

A) ContinuousA) Continuous

B) OrdinalB) Ordinal

C) NominalC) Nominal

D) BinaryD) Binary

Page 44: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Review Topics—Class 6Review Topics—Class 6 Ch. 11.2-11.3Ch. 11.2-11.3

Two Sample Tests—Independent Two Sample Tests—Independent SamplesSamples Unequal VariancesUnequal Variances

The other two assumptions must still holdThe other two assumptions must still hold

Now must use with Now must use with vv degrees of degrees of freedomfreedom

Where Where

round round vv down to the nearest integer down to the nearest integer

2

221

21

2121

nsns

xxt

11 2

2

2221

2

121

2

2221

21

nnsnns

nsnsv

Page 45: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Review Topics—Class 6Review Topics—Class 6 Ch. 11.1, 11.3Ch. 11.1, 11.3

Two Sample Tests—Paired SamplesTwo Sample Tests—Paired Samples For each observation in group 1 there is a For each observation in group 1 there is a

corresponding observation in group 2corresponding observation in group 2 First find the difference between the First find the difference between the

corresponding observations corresponding observations

δδ is the true difference in population means is the true difference in population means Thus the hypotheses for a two sided test areThus the hypotheses for a two sided test are

The t-statistic is with n-1 degrees of The t-statistic is with n-1 degrees of freedomfreedom

ns

dt

d

ntxxd tt ,,1,21

21

2221

1211

nn xx

xx

xx

0:,0: 21210 AHH

Page 46: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #7#7

The following output comes from an experiment comparing the The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers believed that the mean carboxyhemoglobin level of the smokers must be must be higherhigher than the mean level of non-smokers. Assume that than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05.the underlying variances are identical and use alpha=0.05. Smokers: Number of obs = 75

Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1 .2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3 .1818182 7.15 0.0000 .9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8 .2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = 9.5263 t = 9.5263 t = 9.5263 P < t = 1.0000 P > |t| = 0.0000 P > t = 0.0000

Page 47: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #7#7

Practice Test Question Practice Test Question #7#7

What is the proper test of this hypothesis?What is the proper test of this hypothesis?

A) A z testA) A z test

B) A one-sample t testB) A one-sample t test

C) A paired t testC) A paired t test

D) An unpaired t testD) An unpaired t test

What is the proper test of this hypothesis?What is the proper test of this hypothesis?

A) A z testA) A z test

B) A one-sample t testB) A one-sample t test

C) A paired t testC) A paired t test

D) An unpaired t testD) An unpaired t test Smokers: Number of obs = 75 Non-smokers: Number of obs = 121

------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1 .2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3 .1818182 7.15 0.0000 .9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8 .2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = 9.5263 t = 9.5263 t = 9.5263 P < t = 1.0000 P > |t| = 0.0000 P > t = 0.0000

Page 48: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #7#7

Practice Test Question Practice Test Question #7#7

What is the proper test of this hypothesis?What is the proper test of this hypothesis?

A) A z testA) A z test

B) A one-sample t testB) A one-sample t test

C) A paired t testC) A paired t test

D) An unpaired t testD) An unpaired t test

What is the proper test of this hypothesis?What is the proper test of this hypothesis?

A) A z testA) A z test

B) A one-sample t testB) A one-sample t test

C) A paired t testC) A paired t test

D) An unpaired t testD) An unpaired t test Smokers: Number of obs = 75 Non-smokers: Number of obs = 121

------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1 .2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3 .1818182 7.15 0.0000 .9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8 .2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = 9.5263 t = 9.5263 t = 9.5263 P < t = 1.0000 P > |t| = 0.0000 P > t = 0.0000

Page 49: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #8#8

The following output comes from an experiment comparing the The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers believed that the mean carboxyhemoglobin level of the smokers must be must be higherhigher than the mean level of non-smokers. Assume than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05.that the underlying variances are identical and use alpha=0.05.

What are the null and alternative hypotheses for this test? What are the null and alternative hypotheses for this test? µµ11 is is the mean carboxyhemoglobin level among smokers, and µthe mean carboxyhemoglobin level among smokers, and µ22 is is the mean level among non-smokers.the mean level among non-smokers.

A) HA) H00: µ: µ1 1 = µ= µ2 ,2 , H HAA: µ: µ1 1 ≠ µ≠ µ22

B) HB) H00: µ: µ1 1 ≤ µ≤ µ2 ,2 , H HAA: µ: µ1 1 ≠ µ≠ µ22

C) HC) H00: µ: µ1 1 ≤ µ≤ µ2 ,2 , H HAA: µ : µ 11> µ> µ22

D) HD) H00: µ: µ1 1 ≥ µ≥ µ2 ,2 , H HAA: µ: µ1 1 < µ< µ22

Page 50: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #8#8

Practice Test Question Practice Test Question #8#8

The following output comes from an experiment The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be carboxyhemoglobin level of the smokers must be higherhigher than the mean level of non-smokers. Assume that the than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05.underlying variances are identical and use alpha=0.05.

What are the null and alternative hypotheses for this What are the null and alternative hypotheses for this test? test? µµ11 is the mean carboxyhemoglobin level among is the mean carboxyhemoglobin level among smokers, and µsmokers, and µ22 is the mean level among non-smokers. is the mean level among non-smokers.

A) HA) H00: µ: µ1 1 = µ= µ2 2 , H, HAA: µ: µ1 1 ≠ µ≠ µ22

B) HB) H00: µ: µ1 1 ≤ µ≤ µ2 2 , H, HAA: µ: µ1 1 ≠ µ≠ µ22

C) HC) H00: µ: µ1 1 ≤ µ≤ µ2 2 , H, HAA: µ : µ 11> µ> µ22

D) HD) H00: µ: µ1 1 ≥ µ≥ µ2 2 , H, HAA: µ: µ1 1 < µ< µ22

The following output comes from an experiment The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be carboxyhemoglobin level of the smokers must be higherhigher than the mean level of non-smokers. Assume that the than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05.underlying variances are identical and use alpha=0.05.

What are the null and alternative hypotheses for this What are the null and alternative hypotheses for this test? test? µµ11 is the mean carboxyhemoglobin level among is the mean carboxyhemoglobin level among smokers, and µsmokers, and µ22 is the mean level among non-smokers. is the mean level among non-smokers.

A) HA) H00: µ: µ1 1 = µ= µ2 2 , H, HAA: µ: µ1 1 ≠ µ≠ µ22

B) HB) H00: µ: µ1 1 ≤ µ≤ µ2 2 , H, HAA: µ: µ1 1 ≠ µ≠ µ22

C) HC) H00: µ: µ1 1 ≤ µ≤ µ2 2 , H, HAA: µ : µ 11> µ> µ22

D) HD) H00: µ: µ1 1 ≥ µ≥ µ2 2 , H, HAA: µ: µ1 1 < µ< µ22

Page 51: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #9#9

Practice Test Question Practice Test Question #9#9

The following output comes from an experiment The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be carboxyhemoglobin level of the smokers must be higherhigher than the mean level of non-smokers. Assume that the than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05.underlying variances are identical and use alpha=0.05.

What is the number of the degrees of freedom associated What is the number of the degrees of freedom associated with this test?with this test?

A) df = nA) df = n11 – 1 = 74 – 1 = 74

B) df = nB) df = n22 – 1 = 121 – 1 = 121

C) df = nC) df = n11 + n + n22 – 1 = 195 – 1 = 195

D) df = nD) df = n11 + n + n22 – 2 = 194 – 2 = 194

The following output comes from an experiment The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be carboxyhemoglobin level of the smokers must be higherhigher than the mean level of non-smokers. Assume that the than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05.underlying variances are identical and use alpha=0.05.

What is the number of the degrees of freedom associated What is the number of the degrees of freedom associated with this test?with this test?

A) df = nA) df = n11 – 1 = 74 – 1 = 74

B) df = nB) df = n22 – 1 = 121 – 1 = 121

C) df = nC) df = n11 + n + n22 – 1 = 195 – 1 = 195

D) df = nD) df = n11 + n + n22 – 2 = 194 – 2 = 194

Page 52: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #9#9

Practice Test Question Practice Test Question #9#9

This is a t-test of two independent samples, therefore:This is a t-test of two independent samples, therefore:

df = ndf = n11 + n + n22 – 2 – 2

=75 + 121 – 2 =75 + 121 – 2

= 194= 194

What is the number of the degrees of freedom What is the number of the degrees of freedom associated with this test?associated with this test?

A) df = nA) df = n11 – 1 = 74 – 1 = 74

B) df = nB) df = n22 – 1 = 121 – 1 = 121

C) df = nC) df = n11 + n + n22 – 1 = 195 – 1 = 195

D) df = nD) df = n11 + n + n22 – 2 = 194 – 2 = 194

This is a t-test of two independent samples, therefore:This is a t-test of two independent samples, therefore:

df = ndf = n11 + n + n22 – 2 – 2

=75 + 121 – 2 =75 + 121 – 2

= 194= 194

What is the number of the degrees of freedom What is the number of the degrees of freedom associated with this test?associated with this test?

A) df = nA) df = n11 – 1 = 74 – 1 = 74

B) df = nB) df = n22 – 1 = 121 – 1 = 121

C) df = nC) df = n11 + n + n22 – 1 = 195 – 1 = 195

D) df = nD) df = n11 + n + n22 – 2 = 194 – 2 = 194

Page 53: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #10#10

Practice Test Question Practice Test Question #10#10

Based on the output, what is the decision?Based on the output, what is the decision?

A) Reject HA) Reject H00; The carboxyhemoglobin levels are higher among non-; The carboxyhemoglobin levels are higher among non-smokerssmokers

B) Do not reject HB) Do not reject H00; The carboxyhemoglobin levels are equal; The carboxyhemoglobin levels are equal

C) Do not reject HC) Do not reject H00; The carboxyhemoglobin levels are lower among ; The carboxyhemoglobin levels are lower among non-smokersnon-smokers

D) Reject HD) Reject H00; The carboxyhemoglobin levels are higher among ; The carboxyhemoglobin levels are higher among smokerssmokers

Based on the output, what is the decision?Based on the output, what is the decision?

A) Reject HA) Reject H00; The carboxyhemoglobin levels are higher among non-; The carboxyhemoglobin levels are higher among non-smokerssmokers

B) Do not reject HB) Do not reject H00; The carboxyhemoglobin levels are equal; The carboxyhemoglobin levels are equal

C) Do not reject HC) Do not reject H00; The carboxyhemoglobin levels are lower among ; The carboxyhemoglobin levels are lower among non-smokersnon-smokers

D) Reject HD) Reject H00; The carboxyhemoglobin levels are higher among ; The carboxyhemoglobin levels are higher among smokerssmokers

Smokers: Number of obs = 75 Non-smokers: Number of obs = 121

------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1 .2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3 .1818182 7.15 0.0000 .9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8 .2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = 9.5263 t = 9.5263 t = 9.5263 P < t = 1.0000 P > |t| = 0.0000 P > t = 0.0000

Page 54: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #10#10

Practice Test Question Practice Test Question #10#10

To ease interpretation, rearrange the hypothesesTo ease interpretation, rearrange the hypotheses HH00: : µµ11 ≤ µ ≤ µ22 can be rewritten as H can be rewritten as H00: µ: µ11 - µ - µ22 ≤ 0 ≤ 0 HHAA: µ: µ11 > µ > µ2 2 can be rewritten as H can be rewritten as HAA: µ: µ11 - µ - µ22 > 0 > 0

Now select the appropriate output (circled)Now select the appropriate output (circled) p < 0.05; reject Hp < 0.05; reject H00 and conclude that smokers have higher and conclude that smokers have higher

carboxyhemoglobin levels than non-smokerscarboxyhemoglobin levels than non-smokers

To ease interpretation, rearrange the hypothesesTo ease interpretation, rearrange the hypotheses HH00: : µµ11 ≤ µ ≤ µ22 can be rewritten as H can be rewritten as H00: µ: µ11 - µ - µ22 ≤ 0 ≤ 0 HHAA: µ: µ11 > µ > µ2 2 can be rewritten as H can be rewritten as HAA: µ: µ11 - µ - µ22 > 0 > 0

Now select the appropriate output (circled)Now select the appropriate output (circled) p < 0.05; reject Hp < 0.05; reject H00 and conclude that smokers have higher and conclude that smokers have higher

carboxyhemoglobin levels than non-smokerscarboxyhemoglobin levels than non-smokers Smokers: Number of obs = 75 Non-smokers: Number of obs = 121

------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1 .2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3 .1818182 7.15 0.0000 .9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8 .2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = 9.5263 t = 9.5263 t = 9.5263 P < t = 1.0000 P > |t| = 0.0000 P > t = 0.0000

Page 55: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #10#10

Practice Test Question Practice Test Question #10#10

Based on the output, what is the decision?Based on the output, what is the decision?

A) Reject HA) Reject H00; The carboxyhemoglobin levels are higher among non-; The carboxyhemoglobin levels are higher among non-smokerssmokers

B) Do not reject HB) Do not reject H00; The carboxyhemoglobin levels are equal; The carboxyhemoglobin levels are equal

C) Do not reject HC) Do not reject H00; The carboxyhemoglobin levels are lower among ; The carboxyhemoglobin levels are lower among non-smokersnon-smokers

D) Reject HD) Reject H00; The carboxyhemoglobin levels are higher among ; The carboxyhemoglobin levels are higher among smokerssmokers

Based on the output, what is the decision?Based on the output, what is the decision?

A) Reject HA) Reject H00; The carboxyhemoglobin levels are higher among non-; The carboxyhemoglobin levels are higher among non-smokerssmokers

B) Do not reject HB) Do not reject H00; The carboxyhemoglobin levels are equal; The carboxyhemoglobin levels are equal

C) Do not reject HC) Do not reject H00; The carboxyhemoglobin levels are lower among ; The carboxyhemoglobin levels are lower among non-smokersnon-smokers

D) Reject HD) Reject H00; The carboxyhemoglobin levels are higher among ; The carboxyhemoglobin levels are higher among smokerssmokers

Smokers: Number of obs = 75 Non-smokers: Number of obs = 121

------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1 .2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3 .1818182 7.15 0.0000 .9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8 .2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = 9.5263 t = 9.5263 t = 9.5263 P < t = 1.0000 P > |t| = 0.0000 P > t = 0.0000

Page 56: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #20#20

Ten infants were involved in a study to compare the Ten infants were involved in a study to compare the effectiveness of two medications for the treatment of effectiveness of two medications for the treatment of diaper rash. For each baby, two areas of approximately diaper rash. For each baby, two areas of approximately the same size and rash severity were selected, and one the same size and rash severity were selected, and one area was treated with medication A and the other area area was treated with medication A and the other area was treated with medication B. The number of hours for was treated with medication B. The number of hours for the rash to disappear was recorded for each medication the rash to disappear was recorded for each medication and each infant. The question of interest: is there and each infant. The question of interest: is there sufficient evidence to conclude that a difference exists sufficient evidence to conclude that a difference exists in the mean time required for elimination of the rash? in the mean time required for elimination of the rash? The appropriate statistical methodology to use is:The appropriate statistical methodology to use is:

A) A 2-sample (independent) t-testA) A 2-sample (independent) t-testB) A z-testB) A z-testC) A paired t-testC) A paired t-testD) A sensitivity analysisD) A sensitivity analysis

Page 57: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #20#20

Practice Test Question Practice Test Question #20#20

Medication A and Medication B are “matched” Medication A and Medication B are “matched” to the same infant. The samples are not to the same infant. The samples are not independent.independent.

We were not given a known standard deviation We were not given a known standard deviation σσ..

A sensitivity analysis is typically a sub-analysis A sensitivity analysis is typically a sub-analysis that is performed, changing model parameters that is performed, changing model parameters to see if the results remain consistent. This to see if the results remain consistent. This concept has not been covered in class and is not concept has not been covered in class and is not relevant to this question.relevant to this question.

This study calls for a paired t-test.This study calls for a paired t-test.

A) A 2-sample (independent) t-testA) A 2-sample (independent) t-testB) A z-testB) A z-testC) A paired t-testC) A paired t-testD) A sensitivity analysisD) A sensitivity analysis

Medication A and Medication B are “matched” Medication A and Medication B are “matched” to the same infant. The samples are not to the same infant. The samples are not independent.independent.

We were not given a known standard deviation We were not given a known standard deviation σσ..

A sensitivity analysis is typically a sub-analysis A sensitivity analysis is typically a sub-analysis that is performed, changing model parameters that is performed, changing model parameters to see if the results remain consistent. This to see if the results remain consistent. This concept has not been covered in class and is not concept has not been covered in class and is not relevant to this question.relevant to this question.

This study calls for a paired t-test.This study calls for a paired t-test.

A) A 2-sample (independent) t-testA) A 2-sample (independent) t-testB) A z-testB) A z-testC) A paired t-testC) A paired t-testD) A sensitivity analysisD) A sensitivity analysis

Page 58: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Review Topics—Class 6Review Topics—Class 6 Ch. 13.1-13.5Ch. 13.1-13.5

NonparametricNonparametric Necessary when the data is not normally Necessary when the data is not normally

distributeddistributed

We rank the data rather than using the We rank the data rather than using the raw dataraw data

Be able to interpret STATA output Be able to interpret STATA output

Page 59: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #14#14

A clinical trial is conducted to compare two A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the (forced expiratory volume). Differences found at the end of the trial may be due to:end of the trial may be due to:

A) Treatment effect differencesA) Treatment effect differencesB) ChanceB) ChanceC) Group differences with respect to other C) Group differences with respect to other

variables (e.g. variables (e.g. sex, age, race, severity of sex, age, race, severity of asthma, etc.)asthma, etc.)

D) All of the aboveD) All of the above

Page 60: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

Practice Test Question Practice Test Question #14#14

Practice Test Question Practice Test Question #14#14

A clinical trial is conducted to compare two A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split treatments for asthma. Asthmatics are split into two groups with one group receiving into two groups with one group receiving Treatment A and the other group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the Treatment B. At the end of the trial, the groups are compared with respect to FEV groups are compared with respect to FEV (forced expiratory volume). Differences found (forced expiratory volume). Differences found at the end of the trial may be due to:at the end of the trial may be due to:

A) Treatment effect differencesA) Treatment effect differencesB) ChanceB) ChanceC) Group differences with respect to C) Group differences with respect to

other variables other variables (e.g. sex, age, race, (e.g. sex, age, race, severity of asthma, etc.)severity of asthma, etc.)

D) All of the aboveD) All of the above

A clinical trial is conducted to compare two A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split treatments for asthma. Asthmatics are split into two groups with one group receiving into two groups with one group receiving Treatment A and the other group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the Treatment B. At the end of the trial, the groups are compared with respect to FEV groups are compared with respect to FEV (forced expiratory volume). Differences found (forced expiratory volume). Differences found at the end of the trial may be due to:at the end of the trial may be due to:

A) Treatment effect differencesA) Treatment effect differencesB) ChanceB) ChanceC) Group differences with respect to C) Group differences with respect to

other variables other variables (e.g. sex, age, race, (e.g. sex, age, race, severity of asthma, etc.)severity of asthma, etc.)

D) All of the aboveD) All of the above

Page 61: STAT E-102 Midterm Review March 14, 2007. Review Topics—Class 1 Ch. 1, 2 Populations and samples Populations and samples Parameters (usually unknown)

GOOD LUCK!