Stat 255 Supplement 2011 Fall

1

Contents

1 Large Sample Confidence Intervals for µ 2

2 P-Values 5

3 Large Sample Inferences on µ1 − µ2 8

4 Brief Introduction to ANOVA 11

5 Brief Introduction to Simple Linear Regression 14

6 Brief Introduction to Chi-Square Tests 19

7 Formula List 24

8 Formula Review 25

9 First Block Sample Test 30

10 Second Block Sample Test 33

11 Third Block Sample Test 36

12 Sample Final Examination 1 40



15 Exercises 68

16 Minitab Assignments 72

16.1 Minitab Assignment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72




1 LARGE SAMPLE CONFIDENCE INTERVALS FOR µ 2

1 Central Limit Theorem and Large Sample Confidence

Intervals for µ

Suppose X1, X2, . . . , Xn is a random sample of population random variable X having a distributionwith mean µ, standard deviation σ, and any shape (not necessarily Normal). The Central LimitTheorem states that whenever the sample size n is sufficiently large (usually n > 25 will suffice) the

standardized sample meanX − µ

σ/√n

has approximately a Standard Normal distribution. Therefore,

if µ and σ are both known and n > 25, we can make probability statements about X that are validregardless of the shape of the population distribution.

Example 1A large population of seeds of the princess bean Phaseotus vulgaris is to be sampled. The weightsof the seeds in the population follow a distribution having mean µ = 500 mg, standard deviation σ= 120 mg, and unknown shape. Suppose a random sample of 32 seeds is to be weighed, and let Xrepresent the sample mean weight of the 32 seeds.

(a) What is the probability that the sample mean weight will fall between 490 mg and 520 mg?

(b) Find c such that there is a 75% chance that the sample mean weight X will fall between500− c and 500 + c milligrams.

Solution for Example 1Because the population mean and population standard deviation are both known and the samplesize is larger than 25, we use the Central Limit Theorem to handle both parts.

(a)

P (490 < X < 520) = P

(490− 500

120/√32

<X − 500

120/√32

<520− 500

120/√32

)≈ P (−0.47 < Z < 0.94)

= P (Z < 0.94)− P (Z ≤ −0.47)

= 0.8264− 0.3192 = 0.5072,

where Z is a standard normal random variable.

(b)

0.75 = P (500− c < X < 500 + c) =

(−c

120/√32

<X − 500

120/√32

<c

120/√32

)≈ P

(−c

120/√32

< Z <c

120/√32

)


Since the standard normal distribution is symmetric about 0, we obtain P

(Z ≤ −c

120/√32

)≈

0.1250.

From the standard normal table, we find that P (Z ≤ −1.15) = 0.1251 ≈ 0.1250. Therefore, we set−c

120/√32

= −1.15 and solve for c, obtaining c = 24.4 mg.

In Example 1 we were able to handle the probability questions about the sample mean X using theCentral Limit Theorem, because the sample size was larger than 25 and, although the exact shapeof the population distribution was unknown, the two population parameters µ and σ were known.

When the population parameters µ and σ are unknown, our goals change from answering probabilityquestions to answering statistical questions. Instead of making probability statements about X, weobserve the random sample Xl = x1, X2 = x2, . . . , Xn = xn and we use these data to calculate theobserved sample mean x, an estimate of the population mean µ, as well as the observed value ofthe sample standard deviation, an estimate of the population standard deviation σ. Because thepopulation mean µ carries much useful information about the population, most of our efforts focuson this parameter. Not only are we interested in the observed value of the sample mean x, whichgives in some sense our best point estimate of the population mean µ, but also we want to constructan interval of values centered at x that almost surely includes the true value of µ. This leads us tothe notion of a confidence interval (CI).

Definition

The random interval (X − d,X + d) is called a 100(1− α)% confidence interval for µ

provided P (X − d < µ < X + d) = 1− α.

The value α is specified by the user at a suitably low level (typically 0.05, 0.01, or 0.10, which yieldconfidence levels 95%, 99%, or 90%, respectively).

We begin our search for the sampling allowance d by first considering the (admittedly unrealistic)case where the population mean µ is unknown, but the population standard deviation σ is known.When n > 25, we obtain

1− α = P (X − d < µ < X + d) = P (−d < X − µ < d)

= P

(−d

σ/√n<

X − µ

σ/√n

<d

σ/√n

)≈ P

(−d

σ/√n< Z <

d

σ/√n

)by the Central Limit Theorem.

Therefored

σ/√n

= zα/2, where zα/2 is defined to be that unique positive real number such that


P(−zα/2 < Z < zα/2

)= 1 − α, or equivalently such that P

(Z ≥ zα/2

)=

α

2= P

(Z ≤ −zα/2

).

Hence, d = zα/2σ√n.

The distributional fact that led to this result is:X − µ

σ/√n

.∼N(0, 1) when n > 25. Also, this approxima-

tion remains valid when σ is replaced by the sample standard deviation S =

√1

n− 1

∑ni=1(Xi −X)2,

provided n is sufficiently large (usually n > 40 will suffice). That is,X − µ

S/√n

.∼N(0, 1) when n > 40.

Now we have the tools we need to handle the more realistic case where both population parametersµ and σ are unknown. We simply modify the result in the first case by replacing σ with S andobtain the following:

(X − zα/2

S√n,X + zα/2

S√n

)is an approximate 100(1− α)% confidence interval for µ when n > 40.

Example 2Low bone mineral density often leads to hip fractures in the elderly. In an experiment to assess theeffectiveness of hormone replacement therapy, researchers gave conjugated equine estrogen (CEE)to a sample of 94 women between the ages of 45 and 64. After taking the medication for 36 months,the bone mineral density was measured for each of the 94 women. The sample mean density was0.878 g/cm2, with a sample standard deviation of 0.126 g/cm2. Find a 95% confidence interval forthe true mean hip bone mineral density µ of all women age 45 to 64 who take CEE for 36 months.

Solution for Example 295% CI for µ:

x± z.025s√n= 0.878± 1.96

(0.126)√94

= 0.878± 0.025 or (0.853, 0.903)g/cm2

The critical value z.025 = 1.96 is obtained from the standard normal table entry

P (Z ≤ −1.96) = 0.025 =0.05

2.

Thus, we are approximately 95% confident that the true mean hip bone mineral density of all womenage 45 to 64 who take CEE for 36 months is between 0.853 g/cm2 and 0.903 g/cm2.

2 P-VALUES 5

2 Significance Testing: The P-Value Approach

Hypothesis testing questions involve deciding between two contradictory hypotheses: the null hy-pothesis H0 and the alternative hypothesis H1. The null hypothesis is formed in such a way thatit is capable of rejection. For example, if we are interested in the claim that there are coyotes inBeacon Hill park we can consider two hypotheses.

1. There are coyotes in Beacon Hill park.

2. There are no coyotes in Beacon Hill park.

Which of these is refutable? If we find coyotes in Beacon Hill park, then (2) is refutable. If we don’tfind coyotes in Beacon Hill park then we don’t know if there are no coyotes, or we simply missedseeing the coyotes. The null hypothesis always comes in the form of (2); it says that “nothinghappens”.

If we are to compare two population means to find out if they differ, the null hypothesis would bethat the population means are the same. If we want to find out if the proportion of Victorians whosubscribe to the Times Colonist is more than 50% then the null hypothesis is that the populationproportion is 50%. Notice that the alternative hypothesis is the research question (populationmeans differ) whereas the null hypothesis is the complement to the research question.

The approach is to then put the null hypothesis on trial and assume that H0 is true until we havefound substantial statistical evidence to reject H0. In statistics, evidence is summarized in theform of an observed value called a test statistic. A test statistic is a function of the data and istherefore a random variable. Under H0 (that is, assuming H0 is true) the test statistic will have aknown distribution. The test statistic is constructed so that it takes on more extreme values whenthe data come from a process other than that where H0 is true. This is similar to a court of law(in Canada). The accused is assumed innocent (null hypothesis) until enough evidence is broughtforward to conclude a guilty verdict (alternative hypothesis).

The classical approach to hypothesis testing involves the construction of a level α rejection region,which is a set of extreme values that has only a small chance α of including the observed value ofthe test statistic under H0. The user of a level α test rejects H0 if and only if the observed valueof the test statistic falls inside the level α rejection region. For a given hypothesis testing question,the significance level α, which is the probability of rejecting H0 when H0 is true (or Type 1 errorprobability), is specified by the user at a sufficiently low level (typically 0.10, or 0.05, or 0.01) togive the desired control over the chance of wrongly rejecting H0.

The P-value approach to hypothesis testing will be emphasized in this course. Here, in order tojudge the strength of statistical evidence against H0 in favour of H1, we ask: “If H0 is true and wewere to rerun the experiment, what would be the chance of finding evidence against H0 in favourof H1 at least as strong as the observed evidence?” This chance is the P-value. The smaller theP-value, the more rare the observed result if H0 is true, and the stronger the statistical case againstH0.

For a given hypothesis testing question, the P-value is calculated from the observed value of thetest statistic as follows:

2 P-VALUES 6

P-value = the probability, computed under the assumption that H0 is true, that a rerun of

the experiment would yield a value of the test statistic that is at least as extreme

(i.e. would yield at least as much evidence against H0 in favour of H1)

as the observed value.

The P-Value Approach, step-by-step:

1. Define the parameter(s) to be tested. Use standard notation.

2. Specify H0 and H1.

3. Specify the Test Statistic and identify its (approximate) distribution under H0.

4. Compute the observed value of the Test Statistic.

5. Compute the P-value.

6. Report strength of evidence (very strong if P-value ≤ 0.01, strong if 0.01 < P-value ≤ 0.05,moderate if 0.05 < P-value ≤ 0.10, little or none if 0.10 < P-value) against H0 in favour ofH1, and report the estimated value of the parameter being tested plus the estimated standarderror of the parameter being tested.

7. If asked to test H0 at level α, compare α with the P-value and reject H0 if and only if the P-value ≤ α. By doing this, a classical level α test can be carried out without ever constructinga level α rejection region.

Example 1A random sample of 49 four-year-old Red Pine trees was selected, and the diameter of each tree’smain stem was measured. The sample mean diameter was found to be 14.64 cm and the samplestandard deviation was 2.85 cm. Do these data provide substantial evidence that the true meandiameter of four-year-old Red Pine trees in the sampled region differs from 14 cm ?

Solution for Example 1

1. µ = true mean diameter of four-year-old Red Pine trees in the sampled region

2. H0 : µ = 14 vs. H1 : µ = 14

3. Test Statistic Z =X − 14

S/√n

.∼N(0, 1) under H0 because n > 40

4. Zobs =14.64− 14

2.85/√49

= 1.57

5. P-value ≈ P (Z ≥ 1.57 or Z ≤ −1.57) = 2(0.0582) = 0.1164

6. There is little or no evidence (P-value = 0.1164) against H0 : µ = 14. The estimated value ofµ is x = 14.64cm, with estimated standard error = s/

√n = 2.85/

√49 = 0.4071cm.

2 P-VALUES 7

Example 2The oxygen uptakes during incubation of a random sample of 45 cell suspensions yielded a samplemean of 13.43 mL and a sample standard deviation of 2.28 mL. Do these data provide substantialevidence that the true mean oxygen uptake during incubation is higher than 12.5 mL? Test therelevant hypotheses at level α = 0.05.


1. µ = true mean oxygen uptake during incubation (in mL)

2. H0 : µ = 12.5 vs. H1 : µ > 12.5

3. Test Statistic Z =X − 12.5

S/√n


4. Zobs =13.43− 12.5

2.28/√45

= 2.74

5. P-value ≈ P (Z ≥ 2.74) = 0.0031

6. There is very strong evidence (P-value = 0.0031) against H0 : µ = 12.5. The estimated value

of µ is 13.43 mL, with estimated standard error = 2.28/√45 = 0.34 mL.

7. Since the P-value ≤ 0.05, reject H0 at level α = 0.05.

Example 3A random sample of 60 air samples taken at the same site over a period of 3 months yielded asample mean amount of suspended particulate matter equal to 38.9 µg/m3 of air and a samplestandard deviation of 5.1 µg/m3. Do these data indicate that the true mean amount of suspendedparticulate matter at this site is under 40 µg/m3, the established maximum safe level?


1. µ = true mean amount of suspended particulate matter in the air at the sampled site (inµg/m3)

2. H0 : µ = 40 vs. H1 : µ < 40

3. Test Statistic Z =X − 40

S/√n


4. Zobs =38.9− 40

5.1/√60

= −1.67

5. P-value ≈ P (Z ≤ −1.67) = 0.0475

6. There is strong evidence (P-value = 0.0475) against H0 : µ = 40. The estimated value of µ is

38.9 µg/m3, with estimated standard error = 5.1/√60 = 0.658 µg/m3.

3 LARGE SAMPLE INFERENCES ON µ1 − µ2 8

3 Large Sample Inferences on µ1 − µ2

Suppose the distribution of Population 1 has unknown mean µ1, unknown standard deviation σ1,and any shape (not necessarily Normal), while the distribution of Population 2 has unknown meanµ2, unknown standard deviation σ2, and any shape (not necessarily Normal). To compare the twopopulation means, we form the difference µ1 − µ2, collect two independent random samples (onefrom each population), and use these data to make inferences about the single parameter µ1 − µ2.As long as both sample sizes are larger than 40, our inferences are valid regardless of the shapes ofthe two population distributions.

We let X i and Si denote, respectively, the sample mean and sample standard deviation based on therandom sample to be drawn from Population i, for i = 1, 2. Our estimator X1 −X2 is an unbiasedestimator for µ1 − µ2, because E(X1 −X2) = µ1 − µ2. Also, because X1 and X2 are independentrandom variables, the standard error of X1 −X2 for estimating µ1 − µ2 is

SD(X1 −X2) =

√V (X1 −X2) =

√V (X1) + (−1)2V (X2) =

√σ21

n1

+σ22

n2

.

Therefore, the estimated standard error of X1 −X2 for estimating µ1 − µ2 is

√S21

n1

+S22

n2

. Here ni

denotes the sample size of the random sample to be drawn fiom Population i, for i = 1, 2. Similarlyas in the single population case, the estimator standardized using the estimated standard error hasapproximately a Standard Normal distribution when both sample sizes are sufficiently large (usuallyn1 and n2 both larger than 40 will suffice). That is,

(X1 −X2)− (µ1 − µ2)√S21

n1

+S22

n2

.∼N(0, 1) whenever n1 > 40 and n2 > 40.

This distributional fact gives us the tool we need to develop large sample procedures for comparingtwo population means. The basic structure here is the same as in the single population setting. Weillustrate this through two examples.

Example 1In a study of the periodical cicada (Magicicada septendecim), researchers measured the hind tibialengths of the shed skins of 110 individuals. Results for males and females are shown in theaccompanying table.

Tibia length (µm)

Group Sample Size Sample Mean Sample Std Dev

Males 60 78.42 2.87

Females 50 80.44 3.52

(a) Is there any evidence against the hypothesis that the true mean tibia lengths are independentof sex? Test the relevant hypotheses at the 10% significance level.


(b) Construct a 90% confidence interval for the difference between true mean tibia length formales and that for females.


(a) 1. µ1 = true mean tibia length for male periodical cicadasµ2 = true mean tibia length for female periodical cicadas

2. H0 : µ1 − µ2 = 0 vs. H1 : µ1 − µ2 = 0

3. Test Statistic Z =(X1 −X2)− 0√

S21

n1

+S22

n2

.∼N(0, 1) under H0, because n1 > 40 and n2 > 40.

4. Zobs =(78.42− 80.44)− 0√(2.87)2

60+

(3.52)2

50

=−2.02− 0

0.62= −3.26

5. P-value ≈ P (Z ≤ −3.26 or Z ≥ 3.26) = 2P (Z ≤ −3.26) = 2(0.0006) = 0.0012

6. There is very strong evidence (P-value = 0.0012) against H0 : µ1−µ2 = 0. The estimatedvalue of µ1 − µ2 is −2.02µm, with estimated standard error = 0.62µm.

7. Since P-value ≤ 0.10, reject H0 at level α = 0.10 and claim that µ1 − µ2 = 0.

(b) 90% CI for µ1 − µ2 :

x1 − x2 ± z.05

√s21n1

+s22n2

= −2.02± (1.645)(0.62) = −2.02± 1.02 or (−3.04,−1.00). Thus, we

are 90% confident that µ2 is between 1.00 and 3.04 µm larger than µ1.

Example 2A pain-killing drug was tested for efficacy in 100 women who were experiencing uterine crampingpain following childbirth. Fifty of the women were randomly allocated to receive the drug, and theremaining 50 received a placebo (inert substance). Capsules of drug or placebo were given beforebreakfast and again at noon. A pain relief score, based on hourly questioning throughout the day,was computed for each woman. The possible pain relief scores ranged from 0 (no relief) to 56(complete relief for 8 hours). Summary results are shown in the table.

Pain Relief Score

Treatment Sample Size Sample Mean Sample Std Dev

Drug 50 31.96 12.05

Placebo 50 25.32 13.78

(a) Do these data suggest that the drug will raise the true mean pain relief score for womenexperiencing post-childbirth uterine cramping pain by more than 5 points? Test at the 1%significance level.


(b) Construct a 99% confidence interval for the difference in true mean pain relief scores for thetwo groups of women who were experiencing uterine cramping pain following childbirth.


(a) 1. µ1 = true mean pain relief score for women receiving the drug to relieve post-deliveryuterine cramping painµ2 = true mean pain relief score for women receiving the placebo to relieve post-deliveryuterine cramping pain

2. H0 : µ1 − µ2 = 5 vs. H1 : µ1 − µ2 > 5

3. Test Statistic Z =(X1 −X2)− 5√

S21

n1

+S22

n2

.∼N(0, 1) under H0, because n1 > 40 and n2 > 40.

4. Zobs =(31.96− 25.32)− 5√(12.05)2

50+

(13.78)2

50

=6.64− 5

2.59= 0.63

5. P-value ≈ P (Z ≥ 0.63) = 0.2643

6. There is little or no evidence (P-value = 0.2643) against H0 : µ1−µ2 = 5. The estimatedvalue of µ1 − µ2 is 6.64, with estimated standard error = 2.59.

7. Since P-value > 0.01, do not reject H0 at level α = 0.01.

(b) 99% CI for µ1 − µ2 :

x1 − x2 ± z.005

√s21n1

+s22n2

= 6.64± (2.575)(2.59) = 6.64± 6.67 or (−0.03, 13.31). Thus, we are

99% confident that the true value of µ1 − µ2 lies between −0.03 and 13.31 relief scores.

4 BRIEF INTRODUCTION TO ANOVA 11

4 Brief Introduction to Analysis of Variance (ANOVA)

One-Way ANOVA

Suppose we want to compare k population means µ1, µ2, . . . , µk, where all of the population distri-butions are normal (bell-shaped) and where the factors causing variation in these populations aresimilar enough to make reasonable the assumption that all k population variances are the same.Inferences comparing the population means are based on k independent random samples of sizesnl, n2, . . . , nk, respectively. We use double-subscripting with Xij denoting the jth observation in therandom sample to be taken from the ith population, so Xi1, Xi2, . . . , Xini

is a random sample from a

N(µi, σ) population, for i = 1, 2, . . . , k. The ith sample meanX i· =Xi1, Xi2, . . . , Xini

ni

is a good unbi-

ased estimator for µi, and the error mean square MSE =(n1 − 1)S2

1 + (n2 − 1)S22 + · · ·+ (nk − 1)S2

k

N − kis a good unbiased estimator for the common variance σ2. Here, N = n1 + n2 + · · · + nk, and

S2i =

1

ni − 1

∑ni

j=1

(Xij −X i·

)2is the ith sample variance, for i = 1, 2, . . . , k.

To test H0 : µ1 = µ2 = · · · = µk versus H1 : H0 is false, we use the test statistic:

F =

1

(k − 1)

k∑i=1

ni

(X i· −X ··

)2MSE

=MSTr

MSE

,

where X ·· =

∑ki=1

∑ni

j=1 Xij

Nis the grand sample mean, and MSTr is called the treatment mean

square.

The denominator of this test statistic should take a value near the common variance σ2 regardless ofwhether or not H0 is true, but the numerator is sensitive to the hypotheses. If the null hypothesis istrue, all of the k different sample means should take values near each other and hence near the grandsample mean. Therefore, small values of the non-negative valued test statistic F are consistent withH0, and large values of F discredit H0. When the null hypothesis is true, the test statistic has anF-distribution with k − 1 degrees of freedom for the numerator and N − k degrees of freedom forthe denominator, so the P-value corresponding to Fobs is the probability that an F (k − 1, N − k)random variable will take a value at least as large as Fobs.

If k = 2, this F-test is equivalent to the pooled t-test for testing H0 : µ1 = µ2 vs H1 : µ1 = µ2.

Tukey’s Pairwise Comparisons give 100(1 − α)% confidence intervals valid simultaneously for allpairwise differences of means.

ExampleSix samples of each of four types of cereal grain grown in a certain region were analyzed to determinethiamin content, resulting in the following data (µg/g):

Wheat 5.2 4.5 6.0 6.1 6.7 5.8

Barley 6.5 8.0 6.1 7.5 5.9 5.6

Maize 5.8 4.7 6.4 4.9 6.0 5.2

Oats 8.3 6.1 7.8 7.0 5.5 7.2


Assume thiamin content measurements are normally distributed in each of the four different popu-lations and the variance is homogeneous. Use Minitab to complete parts (a) and (b).

(a) Do these data suggest that at least two of the grains differ with respect to true mean thiamincontent?

(b) Interpret Tukey’s 95% simultaneous confidence intervals. Use these results to identify whichpairs of means are different; for each unequal pair, identify which mean is larger.

One-way ANOVA: Response versus Factor

Source DF SS MS F P

Factor 3 8.983 2.994 3.96 0.023

Error 20 15.137 0.757

Total 23 24.120

S = 0.8700 R-Sq = 37.24% R-Sq(adj) = 27.83%

Individual 95% CIs For Mean Based on

Pooled StDev

Level N Mean StDev -+---------+---------+---------+--------

1 6 5.7167 0.7679 (--------*---------)

2 6 6.6000 0.9508 (---------*--------)

3 6 5.5000 0.6693 (---------*--------)

4 6 6.9833 1.0420 (--------*---------)

-+---------+---------+---------+--------

4.80 5.60 6.40 7.20

Pooled StDev = 0.8700

Tukey 95% Simultaneous Confidence Intervals

All Pairwise Comparisons among Levels of Factor

Individual confidence level = 98.89%

Factor = 1 subtracted from:

Factor Lower Center Upper ---------+---------+---------+---------+

2 -0.5231 0.8833 2.2898 (--------*--------)

3 -1.6231 -0.2167 1.1898 (---------*--------)

4 -0.1398 1.2667 2.6731 (--------*---------)

---------+---------+---------+---------+

-1.5 0.0 1.5 3.0



3 -2.5064 -1.1000 0.3064 (---------*--------)

4 -1.0231 0.3833 1.7898 (---------*--------)

---------+---------+---------+---------+

-1.5 0.0 1.5 3.0




4 0.0769 1.4833 2.8898 (--------*--------)

---------+---------+---------+---------+

-1.5 0.0 1.5 3.0

Let µ1, µ2, µ3, and µ4 denote true mean thiamin content in the sampled population of wheat, barley,maize, and oats, respectively.

(a) H0 : µ1 = µ2 = µ3 = µ4 vs. H1 : H0 is falseThere is strong evidence (P-value = 0.023) against H0.

(b) With 95% confidence the following six statements are true:

−0.5231 < µ2 − µl < 2.2898, −1.6231 < µ3 − µl < 1.1898, −0.1398 < µ4 − µl < 2.6731,

−2.5064 < µ3 − µ2 < 0.3064, −1.0231 < µ4 − µ2 < 1.7898, 0.0769 < µ4 − µ3 < 2.8898.

Therefore, with more than 95% confidence, µ4 > µ3.

5 BRIEF INTRODUCTION TO SIMPLE LINEAR REGRESSION 14

5 Brief Introduction to Simple Linear Regression

Simple Linear Regression ModelThe response random variable Y depends on predictor variable X such that the (conditional)distribution of Y , given X = x, is N(µY |x = β0 + β1x, σ). Note that the mean of Y , given X = x,is a linear function of x, and the variance of Y is the same for all values of the predictor variable.In some cases the predictor variable is a random variable, and in others it is a variable controlledby the researcher.

An (x, y) data pair is obtained by observing the response Y = y when the value of the predictor is x.From n independent trials, we obtain the data: (xl, yl), (x2, y2), . . . , (xn, yn). These data are used toestimate the three unknown model parameters: β0, β1, σ

2. The unbiased estimates are denoted byy = β0+ β1, MSE, respectively. Then µY |x = β0+βlx is estimated by y = β0+ β1x. The y-intercept

β0 and the slope β1 of the estimated regression line are determined to minimize∑n

i=1(yi − yi)2,

where yi = β0 + β1xi. The formulas for calculating the parameter estimates are:

β0 = y − β1x, β1 =

n∑i=1

(xi − x)(yi − y)

n∑i=1

(xi − x)2, MSE =

1

n− 2

n∑i=1

(yi − yi)2.

It can be shown that the total y-variation (called Total SS by Minitab)∑n

i=1(yi − yi)2 equals the

amount of y-variation explained by the simple linear regression model (called Regression SS byMinitab)

∑ni=1(yi − yi)

2 plus the amount of y-variation due to error (called Residual Error SSby Minitab). The term R-Sq = (Regression SS)/(Total SS) gives the proportion of y-variationexplained by the simple linear regression model.

To test H0 : µY |x does not depend on x (i.e. β1 = 0) versus H1 : H0 is false, we use the test statistic

F =

(n∑

i=1

(yi − y)2

)÷ 1

MSE

=MSR

MSE

,

where MSR is called Regression MS by Minitab and MSE is called Residual Error by Minitab.The denominator of this test statistic should take a value near the variance σ2 regardless of whetheror not H0 is true, but the numerator is sensitive to the hypothesis. If the null hypothesis is true,each fitted response yi should be near y. Therefore, small values of the non-negative valued teststatistic F are consistent with H0, and large values of F discredit H0. When the null hypothesis istrue, the test statistic has an F -distribution with 1 degree of freedom for the numerator and n− 2degrees of freedom for the denominator, so the P -value corresponding to Fobs is the probability thatan F (1, n− 2) random variable will take a value at least as large as Fobs.


Example 1An experiment to investigate the variability of soil water properties and crop yield in a slopedwatershed gave the following data on grain sorghum yield y (in g/m-row) and distance upslope x(in m) on a sloping watershed:

x 0 10 20 30 45 50 70 80 100 120 140 160 170 190

y 500 590 410 470 450 480 510 450 360 400 300 410 280 350

Use Minitab to analyze these data.

(a) Construct a fitted line plot. Does the simple linear regression model appear to be plausible?

(b) Is there any evidence that true mean grain sorghum yield depends on the distance upslope ofthe planting?

(c) Estimate true mean yield when distance upslope is 75m by giving a 95% confidence intervalof plausible values.

(a)

The scatter plot of the data displays a linear trend with negative slope. The variance σ2

appears to be fairly large and appears not to depend on x over the sampled range of distanceupslope values. The proportion of y-variation explained by the simple linear regression modelis 61.6%. The simple linear regression model appears to be plausible.


Regression Analysis: y versus x

The regression equation is

y = 515 - 1.06 x

Predictor Coef SE Coef T P

Constant 515.45 25.14 20.50 0.000

x -1.0601 0.2414 -4.39 0.001

S = 54.80 R-Sq = 61.6% R-Sq(adj) = 58.4%

Analysis of Variance

Source DF SS MS F P

Regression 1 57906 57906 19.28 0.001

Residual Error 12 36037 3003

Total 13 93943

Predicted Values for New Observations

New Obs Fit SE Fit 95.0% CI 95.0% PI

1 435.9 14.8 ( 403.6, 468.2) ( 312.2, 559.6)

Values of Predictors for New Observations

New Obs x

1 75.0

Let µY |x, denote the true mean grain sorghum yield when the distance upslope is x.

(b) For testing H0 : µY |x does not depend on x vs. H1 : H0 is false, the observed value of thetest statistic Fobs = 19.28 leads to the P -value calculation p = P (F (1, 12) ≥ 19.28) = 0.001.Hence, there is very strong evidence against H0. This suggests that the true mean grainsorghum yield does depend on distance upslope on a sloping watershed.

(c) The estimated value of µY |75 is 435.9 g/m-row, and 403.6 < µY |75 < 468.2 with 95% confidence.

Example 2An experiment is conducted to study the relationship between the shell height X and shell length Y(each measured in millimetres) in Patelloida pygmaea, a limpet found attached to rocks and shellsalong sheltered shores in the Indo-Pacific area. These data result:


x y x y x y x y

0.9 3.1 1.9 5.0 2.1 5.6 2.3 5.8

1.5 3.6 1.9 5.3 2.1 5.7 2.3 6.2

1.6 4.3 1.9 5.7 2.1 5.8 2.3 6.3

1.7 4.7 2.0 4.4 2.2 5.2 2.3 6.4

1.7 5.5 2.0 5.2 2.2 5.3 2.4 6.4

1.8 5.7 2.0 5.3 2.2 5.6 2.4 6.3

1.8 5.2 2.1 5.4 2.2 5.8 2.7 6.3


(a) Construct a fitted line plot. Does the simple linear regression model appear to be plausible?

(b) Is there any evidence that the true mean shell length depends on shell height?

(c) Estimate the true mean shell length when shell height is 2.0 mm by giving a 95% confidenceinterval of plausible values.

(a)

The scatter plot of the data displays a linear trend with positive slope. The variance σ2

appears not to depend on x over the sampled range of shell height values. The proportionof y-variation explained by the simple linear regression model is 74.6%. The simple linearregression model appears to be plausible.


Regression Analysis: y versus x

The regression equation is

y = 1.36 + 2.00 x

Predictor Coef SE Coef T P

Constant 1.3611 0.4681 2.91 0.007

x 1.9963 0.2284 8.74 0.000

S = 0.4128 R-Sq = 74.6% R-Sq(adj) = 73.6%

Analysis of Variance

Source DF SS MS F P

Regression 1 13.020 13.020 76.42 0.000

Residual Error 26 4.430 0.170

Total 27 17.450

Unusual ObservationsObs x y Fit SE Fit Residual St Resid

1 0.90 3.1000 3.1577 0.2677 -0.0577 -0.18 X

11 2.00 4.4000 5.3537 0.0782 -0.9537 -2.35R

R denotes an observation with a large standardized residual

X denotes an observation whose X value gives it large influence.

Predicted Values for New Observations

New Obs Fit SE Fit 95.0% CI 95.0% PI

1 5.3537 0.0782 ( 5.1930, 5.5143) ( 4.4901, 6.2172)

Values of Predictors for New Observations

New Obs x

1 2.00

Let µY |x denote the true mean shell length when the shell height is x.

(b) For testing H0 : µY |x does not depend on x vs. H1 : H0 is false, the observed value of thetest statistic Fobs = 76.42 leads to the P-value calculation p = P (F (1, 26) ≥ 76.42) = 0.000.Hence, there is very strong evidence against H0, which suggests that the true mean shelllength does depend on shell height.

(c) The estimated value of µY |2.0 is 5.3537 mm, and 5.1930 < µY |2.0 < 5.5143 with 95% confidence.

6 BRIEF INTRODUCTION TO CHI-SQUARE TESTS 19

6 Brief Introduction to Chi-Square Tests

Chi-Square Test of Independence

Each observation from a random sample of size n is cross-classified into one of r different levelsof the row factor, and one of c different levels of the column factor. These categorical data aresummarized in a two-way table as follows:

Column Factor

Row

1 2 · · · c Totals

1 O11 O12 · · · O1c n1•

Row 2 O21 O22 · · · O2c n2•

Factor...

......

......

r Or1 Or2 · · · Orc nr•

Column Totals n•1 n•2 · · · n•c n

Here Oij = the number of observations having the row factor at level i and the column factor at levelj. Let Ri denote the event that a randomly chosen experimental unit has the row factor at level i,and let Cj denote the event that a randomly chosen experimental unit has the column factor at level

j. Let pij = P (Ri&Cj). Then P (Ri) =c∑

j=1

pij = pi• (estimated byni•

n), and P (Cj) =

r∑i=1

pij = p•j

(estimated byn•j

n). The row factor is independent of the column factor if and only if pij = pi•p•j

for all i, j. We want to test H0 : pij = pi•p•j for all i, j versus H1 : H0 is false (i.e. there is anassociation between the row factor and the column factor).

Let Eij = expected frequency of cell ij = npij. If H0 is true, then Eij = npi•p•j, which is estimated

by Eij = nni•

n

n•j

n=

ni•n•j

n.

To test for independence of row factor and column factor, we use the test statistic

X2 =∑

all cells

(Oij − Eij

)2Eij

.

If the null hypothesis is true, the observed cell frequencies should be near those expected. Therefore,small values of the non-negative valued test statistic X2 are consistent with H0, and large values ofX2 discredit H0. When the null hypothesis is true, the approximate distribution of the test statisticis Chi-Square with (r − l)(c − 1) degrees of freedom, provided all of the expected cell frequenciesare at least 1 and at most 20% of them are less than 5. Therefore, the P-value corresponding toX2

obs is the probability that a χ2(r−l)(c−l) random variable will take a value at least as large as X2

obs.


ExampleIn an investigation to determine whether or not there is an association between heart disease andsnoring, the following data were obtained from a random sample of 2484 respondents.(source: Norton and Dunn 1985)

Frequency of Snoring

Non-snorers Occasional Snore nearly Snore every Row

snorers every night night Totals

Heart Absent 1355 603 192 224 2374

Disease

Present 24 35 21 30 110

Column Totals 1379 638 213 254 2484


(a) Run a Chi-Square Test to determine whether or not the row factor is independent of thecolumn factor.

(b) Speclfy H0 and H1 for this Chi-Square Test.

(c) Give your conclusion based on the P-value. If you find substantial evidence againstH0, discussthe main contributors to a large observed value of the test statistic.

(a) Chi-Square Test: Cl, C2, C3, C4

Expected counts are printed below observed counts

Chi-Square contributions are printed below expected counts

C1 C2 C3 C4 Total

1 1355 603 192 224 2374

1317.93 609.75 203.57 242.75

1.043 0.075 0.657 1.449

2 24 35 21 30 110

61.07 28.25 9.43 11.25

22.499 1.611 14.186 31.262

Total 1379 638 213 254 2484

Chi-Sq = 72.782, DF = 3, P-Value = 0.000

(b) H0 : The absence or presence of heart disease is independent of the frequency of snoring.versusH1 : There is an association between heart disease and frequency of snoring.


(c) There is very strong evidence (P-value = 0.000) against H0. The two main contributors to alarge X2

obs are the second row, first column cell showing many fewer non-snorers with heartdisease than would be expected under the null hypothesis, and the second row, fourth columncell showing many more snorers with heart disease than would be expected under the nullhypothesis.

Chi-Square Test of Homogeneity

Compare r different populations where each member of each population belongs to one of c differentcategories. The data consist of r independent random samples of sizes n1•, n2•, . . . , nr•, respectively.Each observation is classified into one of c different column categories. These categorical data aresummarized in a two-way table as follows:

Column Category

Sample

1 2 · · · c Sizes

1 O11 O12 · · · O1c n1•

Populations 2 O21 O22 · · · O2c n2•...

......

......

r Or1 Or2 · · · Orc nr•

Column Totals n•1 n•2 · · · n•c n

Here Oij = the number of observations from random sample i that fall into column category j. Let(pi1 pi2 · · · pic) = true proportion vector for population i.

We want to testH0 : all of the r proportion vectors are the same (let (p1 p2 · · · pc) = denote the common value)versusH1 : H0 is false

Let Eij = expected frequency of category j in sample i = ni•pij. If H0 is true, then Eij = ni•pj,

which is estimated by Eij = ni•n•j

n.

To test for homogeneity of population proportion vectors, we use the test statistic

X2 =∑

all cells

(Oij − Eij

)2Eij

.

If the null hypothesis is true, the observed cell frequencies should be near those expected. Therefore,small values of the non-negative valued test statistic X2 are consistent with H0, and large values ofX2 discredit H0. When the null hypothesis is true, the approximate distribution of the test statisticis Chi-Square with (r − l)(c − 1) degrees of freedom, provided all of the expected cell frequenciesare at least 1 and at most 20% of them are less than 5. Therefore, the P-value corresponding toX2

obs is the probability that a χ2(r−l)(c−l) random variable will take a value at least as large as X2

obs.


ExampleTo study the effect of soil condition on the growth of a new hybrid plant, saplings were planted onthree types of soil (clay loam, sandy loam, silty loam) and their subsequent growth classified intothree categories (poor, average, good). The following data were obtained:

Growth

Sample

Poor Average Good Sizes

Soil Clay Loam 16 26 18 60

Type Sandy Loam 8 16 36 60

Silty Loam 14 21 25 60

Column Totals 38 63 79 180


(a) Run a Chi-Square Test to determine whether or not the distribution of quality of growthappears to be different for the different soil types.

(b) Specify H0 and H1 for this Chi-Square Test.

(c) Give your conclusion based on the P-value. If you find substantial evidence againstH0, discussthe main contributors to a large observed value of the test statistic.

(a) Chi-Square Test: Cl, C2, C3

Expected counts are printed below observed counts

Chi-Square contributions are printed below expected counts

C1 C2 C3 Total

1 16 26 18 60

12.67 21.00 26.33

0.877 1.190 2.637

2 8 16 36 60

12.67 21.00 26.33

1.719 1.190 3.549

3 14 21 25 60

12.67 21.00 26.33

0.140 0.000 0.068

Total 38 63 79 180

Chi-Sq = 11.371, DF = 4, P-Value = 0.023


(b) H0 : The distribution of growth quality is the same for the three soil types.versusH1 : The distribution of growth quality is not the same for the three soil types.

(c) There is strong evidence (P-value = 0.023) against H0. The two main contributors to alarge Xobs are the second row, third column cell showing many more plants in sandy loamhaving good growth than would be expected under the null hypothesis, and the first row,third column cell showing many fewer plants in clay loam having good growth than would beexpected under the null hypothesis.

7 FORMULA LIST 24

7 Formula List

(provided for midterm tests and final examination)

1

n

n∑i=1

xi

∑all x

xf(x)

n∑i=1

(xi − x)2

n− 1=

n∑i=1

x2i −

1

n

(n∑

i=1

xi

)2

n− 1

∑all x

x2f(x)− µ2

(n

x

)px(1− p)n−x =

n!

x!(n− x)!px(1− p)n−x (λs)x

x!e−λs

s√n

√p(1− p)

n

√s21n1

+s22n2

√p1(1− p1)

n1

+p2(1− p2)

n2

√(n1 − 1)s21 + (n2 − 1)s22

n1 + n2 − 2

(1

n1

+1

n2

)γ = integer part of

(s21/n1 + s22/n2)2

(s21/n1)2

n1 − 1+

(s22/n2)2

n2 − 1

estimate± (c.v.)(e.s.e.)estimate− param. value under H0

e.s.e. or (s.e. under H0)

8 FORMULA REVIEW 25

8 Flash Card Formula Review

ITEM QUESTION SIDE OF FLASH-CARD ANSWER SIDE OF FLASH CARD

1In a Density Histogram, each rectangle areaequals

relative freqency of corresponding interval.

2

For an observed sample x1, x2, . . . , xn

sample mean x =

sample median x =

sample variance s2 =

sample standard deviation s =

1

n

n∑i=1

xi

middle ranked observation (n odd), or

average of two middle ranked observations

(n even)

1

n− 1

n∑i=1

(xi − x)2

√s2

3B1, B2, . . . , Bn are mutually exclusive if andonly if

Bi and Bj = ∅ for all i = j

4 P (B) =the chance that event B will occur on anytrial

5 P (B′) = 1− P (B)

6 P (A or B) = P (A) + P (B)− P (A and B)

7 P (A|B) = P (A and B)÷ P (B)

8 P (A and B) = P (A)P (B|A) and P (B)P (A|B)

9A & B are independent if and only ifP (A and B) =

P (A)P (B)

10 The cdf of rv X is F (x) = P (X ≤ x)

11 The density for discrete rv X is f(x) = P (X = x)

12 If rv X is discrete, P (a ≤ X ≤ b) =

∑a≤x≤b

P (X = x) =∑

a≤x≤b

f(x)

13 If rv X is discrete, E(X) = µ =

∑all x

xP (X = x) =∑all x

xf(x)

14 If rv X is discrete, E(g(X)) =

∑all x

g(x)P (X = x) =∑all x

g(x)f(x)

8 FORMULA REVIEW 26


15 V (X) = σ2 = E((X − µ)2) = E(X2)− µ2

16 SD(X) = σ =√

V (X)

17If X = total number of successes out of nindependent trials where P (success) = p onevery trial, then the distribution of X is:

Binomial(n, p)

18If X ∼ Binomial(n, p), then formulas fordensity, mean value, and standard deviationare:

(n

x

)px(1− p)n−x, np,

√np(1− p)

19

If arrivals occur at random in time (or space)at the average rate of λ per unit time (orspace), and X = total number of arrivalsthat occur in a time (or space) window ofsize s, then the distribution of X is:

Poisson(λs)

20If X ∼ Poisson(λs), then formulas for den-sity, mean value, and standard deviationare:

(λs)x

x!eλs, λs,

√λs

21If X ∼ Binomial(n, p), with n ≥ 100, p ≤0.01, and np ≤ 20, then the distribution ofX is well approximated by:

Poisson(λs = np)

22If rv X is continuous with density f , thenP (a ≤ X ≤ b) = P (a < X < b) =

area under density curve between a and b

23If rv X is continuous with density f , thenE(X) = µ =

balance point for the distribution of X

24If X ∼ N(µ, σ), then the distribution of

Z =X − µ

σis:

Standard Normal

25 E(c) = c

26 E(cX) = cE(X)

27 E(X + Y ) = E(X) + E(Y )

28 V (c) = 0

29 V (cX) = c2V (X)

30Two random variables are independent ifand only if

the value assumed by one variable has noinfluence on the value assumed by the other

31 If X and Y are independent, V (X + Y ) = V (X) + V (Y )

8 FORMULA REVIEW 27


32X1, X2, . . . , Xn is a random sample from thedistribution of X provided:

these random variables are independent andeach rv has the same distribution as X

33

If X1, X2, . . . , Xn is a random sample

from a population distribution with

mean µ and standard deviation σ, then:

• the sample mean X has mean value

and standard deviation:

• the sample variance S2 has mean value:

• the Central Limit Theorem states that

if n is sufficiently large (n > 25 usually

will suffice) the approximate distribution

ofX − µ

σ/√n

is:

µ,σ√n

σ2

Standard Normal

34 θ is an unbiased estimator for θ provided: E(θ) = θ

35 The standard error of θ for estimating θ is: SD(θ)

36 A good unbiased estimator for µ is: X

37 A good unbiased estimator for σ2 is: S2 =1

n− 1

n∑i=1

(Xi −X)2

38 A good, slightly biased estimator for σ is: S

39The estimated standard error (ese) of x forestimating µ is:

s√n

40Critical value (cv) zα/2 satisfies P (Z >zα/2) =

α/2

41Critical value (cv) tα/2,γ satisfies P (T(γ) >tα/2,γ) =

α/2

42100(1−α)% confidence interval for µ has theform:

estimate ±(cvα/2)(ese)

43For testing hypotheses about µ, the teststatistic has the form:

estimate− parameter value under H0

ese

44

To compute cv or P-value with unknown σ,

(i) with a large sample (n > 40) use:

(ii) with a small sample (n ≤ 40) and

a near-normal population distribution use:

Standard Normal distribution

t-distribution with n− 1 degrees

of freedom

8 FORMULA REVIEW 28


45P-value = the probability, under H0, that arerun of the experiment would yield:

evidence against H0 in favour of H1 at leastas strong as the observed evidence

46 If P-value ≤ 0.01, there is:very strong evidence against H0 in favour ofH1

47 If 0.01 < P-value < 0.05, there is: strong evidence against H0 in favour of H1

48 If 0.05 < P-value < 0.10, there is:moderate evidence against H0 in favour ofH1

49 If 0.10 < P-value, there is:little or no evidence against H0 in favour ofH1

50

If p is the population proportion, then:

• the sample proportion p has mean value,

standard deviation:

• the Central Limit Theorem says that

if n > 25, np ≥ 10, and n(1− p) ≥ 10,

then the approximate distribution ofp− p√

p(1− p)/nis:

p,

√p(1− p)

n

Standard Normal

51 A good unbiased estimator for p is: p

52

Name good unbiased estimators for

the following population parameters:

µ1 − µ2, p1 − p2, µD

X1 −X2, p1 − p2, D

53

Based on independent random samples fromtwo normally distributed populations havinghomogeneous variance, the most efficient es-timator for the common variance σ2 amongall unbiased estimators that are linear com-binations of S2

1 and S22 is:

S2p =

(n1 − 1)S21 + (n2 − 1)S2

2

n1 + n2 − 2

54The estimated standard error (ese) of p forestimating p is:

√p(1− p)

n

55The estimated standard error (ese) of p1−p2for estimating p1 − p2 is:

√p1(1− p1)

n1

+p2(1− p2)

n2

56The estimated standard error (ese) of x1−x2

for estimating µ1 − µ2 is:

√s21n1

+s22n2

or

√s2p

(1

n1

+1

n2

)(see item 57)

57In item 56 the only case where the secondformula for ese is used is when:

n1 ≤ 40 and/or n2 ≤ 40, both popu-lation distributions are near normal, andlarger of s2l , s

22

smaller of s21, s22

< 2, suggesting σ21 ≈ σ2

2

8 FORMULA REVIEW 29


58The estimated standard error (ese) of d forestimating µD is:

sD√n

59100(1 − α)% confidence intervals forµ, p, µ1 − µ2, p1 − p2, µD have the form:

estimate ±(cvα/2)(ese)

60For testing hypotheses about µ, µl − µ2, µD

and p1−p2 the test statistics have the form:

estimate− parameter value under H0

ese

61For testing H0 : p = p0, the test statistic hasthe form:

p− p0√p0(1− p0)/n

62

To compute cv or P-value

(i) with n1 > 40 and n2 > 40 use:

(ii) with n1 ≤ 40 and/or n2 ≤ 40,

near-normal population distributions, andlarger of s2l , s

22

smaller of s21, s22

≤ 2, use:

(iii) with n1 ≤ 40 and/or n2 ≤ 40,

near-normal population distributions, andlarger of s2l , s

22

smaller of s21, s22

> 2, use:

Standard Normal distribution

t-distribution with df = n1 + n2 − 2

t-distribution with df = γ on Formula list

63 For paired data, analyze: the single sample of differences

64A user should reject H0 at significance levelα if and only if the P-value of the data is:

less than or equal to α

65Assumption underlying large-sample Z pro-cedures for analyzing a single sample orpaired differences:

the data constitute an observed randomsample

66Assumption underlying large-sample Z pro-cedures for analyzing two-sample data sets:

the data constitute two independent ob-served random samples

67Assumptions underlying small-sample Tprocedures for analyzing a single sample orpaired differences:

(i) the data constitute an observed randomsample, and (ii) the population distributionis near normal

68Assumptions underlying small-samplepooled T procedures for analyzing two-sample data sets:

(i) the data constitute two independent ob-served random samples, (ii) the populationdistributions are near normal, and (iii) thepopulation variances are equal

69

Assumptions underlying small-sample(Smith-Satterthwaite) unpooled T pro-cedures for analyzing two-sample datasets:

(i) the data constitute two independent ob-served random samples, (ii) the populationdistributions are near normal, and (iii) thepopulation variances are not equal

9 FIRST BLOCK SAMPLE TEST 30

9 First Block Sample Test

Instructions

• The Sharp EL-5l0R scientific calculator is allowed. This is the only calculator that is allowed.A Formula List page is provided. NO other aids such as books, notes, or scratch paper arepermitted.

• Questions 1 through 9 are multiple-choice questions. For questions requiring numerical an-swers, the 10 choices are listed in numerically increasing order. Choose the value that isnearest your (unrounded) answer. In the special case that your (unrounded) answer isequidistant from the two nearest choices, choose the larger of these two choices. For ver-ification purposes, show all calculations on your question paper. Unverified answers may bedisallowed.

• Questions 10 and 11 are full-answer questions. For each of these questions, write out yoursolution carefully and completely. Marks will be deducted for incomplete or poorly presentedsolutions.

Questions 1 and 2 refer to the following setup.The paper “The Pedalling Technique of Elite Endurance Cyclists” reported the accompanyingdata on single-leg power at a high workload:

228 233 190 187 226 183 156 189 202 283 237 174 213

1. The ratio s/x, the sample standard deviation divided by the sample mean, is called thecoefficient of variation. Compute the coefficient of variation for these data.

(A) 0.15 (B) 0.17 (C) 0.20 (D) 0.25 (E) 0.30

(F) 0.50 (G) 0.75 (H) 1.50 (I) 2.50 (J) 5.00

2. Compute the sample median for these data.

(A) 185 (B) 190 (C) 195 (D) 200 (E) 205

(F) 210 (G) 215 (H) 220 (I) 225 (J) 230

3. In the pea plant, yellow seeds (Y ) are dominant to green (y), and the round shape (R) isdominant to wrinkled (r). Suppose that two double-heterozygous (Y yRr) plants are cross-matched. What is the probability that the cross-match will result in a pea plant with yellow,round seeds?

(A) 0.05 (B) 0.15 (C) 0.25 (D) 0.35 (E) 0.45

(F) 0.55 (G) 0.65 (H) 0.75 (I) 0.85 (J) 0.95


4. The following table shows the cumulative distribution function (cdf) for the discrete randomvariable X, the number of wing beats per second (rounded to the nearest second) of a speciesof large moth while in flight.

x 2 3 4 5 6

F (x) 0.05 0.13 0.22 0.50 1

Find the expected value of X.

(A) 4.3 (B) 4.6 (C) 4.9 (D) 5.2 (E) 5.5

(F) 6.0 (G) 7.0 (H) 8.0 (I) 9.0 (J) 10.0

5. To meet the demand by farmers for cottonwood saplings to use as windbreaks, forestry serviceemployees sampled farmers in the province. They found that 35% had acquired trees from theservice in prior years, 25% anticipated ordering trees from the service in the coming year, and15% had acquired trees in prior years and anticipated ordering additional trees in the comingyear. What is the probability that a randomly chosen farmer did not acquire trees from theservice in prior years and does not anticipate ordering trees from the service in the comingyear?

(A) 0.15 (B) 0.20 (C) 0.25 (D) 0.30 (E) 0.35

(F) 0.40 (G) 0.45 (H) 0.50 (I) 0.55 (J) 0.60

6. A study was conducted to investigate a new procedure for detecting renal disease in patientswith hypertension. Using the new procedure, experimenters screened a random sample of200 hypertensive patients. Then the presence or absence of renal disease was determined byanother method. The data obtained are shown in the following table:

Disease Disease

Absent Present

Disease Detected by the New Procedure 18 22

Disease Not Detected by the New Procedure 155 5

Use these data to estimate the false-positive rate for the new procedure.

(A) Not possible (B) 0.04 (C) 0.06 (D) 0.08 (E) 0.10

(F) 0.12 (G) 0.14 (H) 0.16 (I) 0.20 (J) 0.24

7. Suppose the germination rate for a certain stock of Douglas Fir seed is 75%. If three of theseseeds are planted, what is the probability that at least two of the three seeds will germinate?

(A) 0.50 (B) 0.55 (C) 0.60 (D) 0.65 (E) 0.70

(F) 0.75 (G) 0.80 (H) 0.85 (I) 0.90 (J) 0.95


Questions 8 and 9 refer to the following setup.Cells in damaged tissue being examined under the microscope are graded for extent of damageby the following scale: 0, undamaged; 1, slightly damaged; 2, moderately damaged; 3, exten-sively damaged; 4, very severely damaged. Cells of tissue exposed to 20 minutes of anoxia, anabnormally low oxygen supply, before preparation for microscopic study exhibit the followingdensity, where X is the classification value for damage.

x 0 1 2 3 4

f(x) 0.15 0.20 0.35 0.25 0.05

8. Given that a randomly chosen cell shows some damage, what is the probability that the chosencell is neither extensively damaged nor very severely damaged?

(A) 0.05 (B) 0.15 (C) 0.25 (D) 0.35 (E) 0.45

(F) 0.55 (G) 0.65 (H) 0.75 (I) 0.85 (J) 0.95

9. Compute E ((X − 2)2).

(A) 0.8 (B) 1.0 (C) 1.2 (D) 1.4 (E) 1.6

(F) 1.8 (G) 2.0 (H) 3.0 (I) 4.0 (J) 5.0

10. Thirty percent of the members of a certain population have condition Beta. A diagnostic testfor this condition has a 10% false-positive rate and a 15% false-negative rate. Suppose thistest is administered to a randomly chosen member of this population.

(a) What is the probability that the chosen member is wrongly diagnosed by the test?

(b) If the chosen member tests positive (indicating the presence of condition Beta), what isthe probability it has condition Beta?

11. In a study of a certain population it is found that 40% of the population has characteristicA, 45% of the population has characteristic B, 42% of the population has characteristic C,68% has A or B, 67% has A or C, 62% has B or C, and 80% has A or B or C. What is theprobability that a randomly chosen member of this population will have characteristic A andexactly one of the other two characteristics?

ANSWERS FOR FIRST BLOCK SAMPLE TEST

1. 0.1614 (B) 2. 202 (D) 3. 0.5625 (F)

4. 5.10 (D) 5. 0.55 (I) 6. 0.104 (E)

7. 0.844 (H) 8. 0.647 (G) 9. 1.25 (C)

10. (a) (0.3)(0.15) + (0.7)(0.10) = 0.115 (b)(0.3)(0.85)

(0.3)(0.85) + (0.7)(0.10)= 0.785

11. 0.07 + 0.05 = 0.12

10 SECOND BLOCK SAMPLE TEST 33

10 Second Block Sample Test

Instructions




1. Ornithosis is a pneumonia-like disease in turkeys that has a 40% fatality rate. If 16 turkeyscontract this disease, what is the probability that between 8 and 12 of them, inclusive, willrecover from the disease?

(A) 0.05 (B) 0.15 (C) 0.25 (D) 0.35 (E) 0.45

(F) 0.55 (G) 0.65 (H) 0.75 (I) 0.85 (J) 0.95

Questions 2 and 3 refer to the following setup.Each trial of a certain lab procedure is scored 0 (total failure), 1 (marginal result), 2 (goodresult), or 3 (excellent result) according to the population distribution:

x 0 1 2 3

f(x) 0.1 0.2 0.4 0.3

Let 3 denote the sample mean score from two independent trials of this procedure.

2. What is the probability that X will equal 1.5?

(A) 0.11 (B) 0.14 (C) 0.17 (D) 0.20 (E) 0.23

(F) 0.26 (G) 0.29 (H) 0.32 (I) 0.35 (J) 0.38

3. The standard deviation of X is sometimes called the standard error of X. Compute thestandard error of X. (Hint: First find the population standard deviation σ.)

(A) 0.35 (B) 0.40 (C) 0.45 (D) 0.50 (E) 0.55

(F) 0.60 (G) 0.70 (H) 0.80 (I) 0.90 (J) 1.00


4. In a certain culture, Rickettsia typhi cells occur at random throughout the culture at the aver-age rate of 3 per 10 square micrometers. What is the probability that a 20 square micrometersample of this culture will contain at least 8 such cells?

(A) 0.05 (B) 0.15 (C) 0.25 (D) 0.35 (E) 0.45

(F) 0.55 (G) 0.65 (H) 0.75 (I) 0.85 (J) 0.95

Questions 5 and 6 refer to the following setup.Among a certain population of primates, the volume of the cranial cavity is normally dis-tributed with mean 1300 cc and standard deviation 180 cc.

5. Find the probability that a randomly chosen member of the population will have a cranialcavity volume smaller than 1500 cc.

(A) 0.65 (B) 0.68 (C) 0.71 (D) 0.74 (E) 0.77

(F) 0.80 (G) 0.83 (H) 0.86 (I) 0.89 (J) 0.92

6. Find the value d such that 70% of these primates have cranial cavity volumes between 1300−dand 1300 + d cubic centirnetres.

(A) 154 (B) 158 (C) 162 (D) 166 (E) 170

(F) 174 (G) 178 (H) 182 (I) 186 (J) 190

7. In a study of the nutritional qualities of fast foods, the amount of fat was measured for arandom sample of 60 hamburgers from a particular restaurant chain. The sample mean andsample standard deviation were found to be 45.7 grams and 10.3 grams, respectively. Computethe upper limit of a 95% confidence interval for the true mean fat content in hamburgers servedin these restaurants.

(A) 44.5 (B) 45.0 (C) 45.5 (D) 46.0 (E) 46.5

(F) 47.0 (G) 47.5 (H) 48.0 (I) 48.5 (J) 49.0

Questions 8 and 9 refer to the following setup.Duck farms lining the shores of Great South Bay have seriously polluted the water. Onepollutant is nitrogen in the form of uric acid. The following is a sample of 9 observations onX, the number of kilograms of nitrogen produced per farm per day:

2.4 4.9 3.1 3.3 1.2 2.5 5.8 3.0 2.6

8. Compute the lower limit of a 99% confidence interval for the true mean amount of nitrogenproduced per farm per day in the sampled region.

(A) 1.50 (B) 1.55 (C) 1.60 (D) 1.65 (E) 1.70

(F) 1.75 (G) 1.80 (H) 1.85 (I) 1.90 (J) 1.95


9. Which of the following statements are true?

(i) An assumption underlying the computation in question 8 is: The shape of the sampledpopulation distribution is near normal.

(ii) Based on the observed data, the estimated standard error is 1.3784 kilograms.

(iii) An assumption underlying the computation in question 8 is: The data constitute anobserved random sample.

(A) None (B) Only (i) (C) Only (ii) (D) Only (iii)

(E) (i) & (ii) (F) (i) & (iii) (G) (ii) & (iii) (H) All

10. Consider an individual whose white cell count is 6000 per cubic millimetre of blood, andassume that white cells are randomly distributed in blood. If 3 different 0.001mm3 drops ofblood from this individual are analysed, what is the probability that at most one of thesethree drops will contain fewer than five white cells?

11. Suppose the moisture content per kilogram of a dehydrated protein concentrate has mean 8mg and standard deviation 4 mg. A random sample of 42 specimens, each specimen consistingof one kilogram of this concentrate, is to be tested.

(a) Find the expected value and variance of the sample mean moisture content per kilogramfor this random sample.

(b) Find the probability that the sample mean moisture content per kilogram of will beabove 9 mg.

ANSWERS FOR SECOND BLOCK SAMPLE TEST

1. 0.7926 (H) 2. 0.22 (E) 3. 0.667 (G)

4. 0.256 (C) 5. 0.8665 (H) 6. 187.2 (I)

7. 48.31 (I) 8. 1.658 (D) 9. (F)

10. (0.715)3 +

(3

1

)(0.285)(0.715)2 = 0.803

11. (a) E(X) = µ = 8mg V (X) =σ2

n=

16

42= 0.381mg2

(b) P (X > 9) ≈ P (Z > 1.62) = 0.0526, using the Central Limit Theorem

11 THIRD BLOCK SAMPLE TEST 36

11 Third Block Sample Test

Instructions



• Question 10 is full-answer questions. For each of these questions, write out your solutioncarefully and completely. Marks will be deducted for incomplete or poorly presented solutions.

Questions 1 through 3 refer to the following setup.In running a white cell count, a drop of blood is smeared thinly and evenly on a glass slide,stained with Wright’s stain, and examined under a microscope. Of 112 white cells counted,84 were neutrophils, a white cell produced in the bone marrow whose function, in part, is totake up infective agents in the blood. Let p denote the true proportion of neutrophils amongthe white cells in this individual.

1. Compute the lower limit of an 82% confidence interval for p.

(A) 0.660 (B) 0.665 (C) 0.670 (D) 0.675 (E) 0.680

(F) 0.685 (G) 0.690 (H) 0.695 (I) 0.700 (J) 0.705

2. Is there any evidence that the true proportion of neutrophils among the white cells in thisindividual is above 0.70 ? Compute the P-value for the appropriate test.

(A) 0.03 (B) 0.05 (C) 0.07 (D) 0.09 (E) 0.11

(F) 0.13 (G) 0.15 (H) 0.17 (I) 0.19 (J) 0.21

3. Use the results of the sample of size 112 as a pilot study to determine the sample size neededto estimate p within ±4 percentage points with 90% confidence.

(A) 200 (B) 250 (C) 300 (D) 350 (E) 400

(F) 450 (G) 500 (H) 600 (I) 700 (J) 800


Questions 4 through 6 refer to the following setup.Each species of firefly has a unique flashing pattern. One species has a pattern that consistsof one short pulse followed by a resting period thought to have an average length µ of approx-imately 4 seconds. Data on the resting time between flashes for a sample of 53 fireflies of thisspecies yielded a sample mean resting time of 4.2 seconds and sample standard deviation of0.6 seconds.

4. Do these data contradict the hypothesized mean resting time of µ = 4 seconds? These dataprovide { (a) very strong (b) strong (c) moderate (d) little or no } evidence against H0 tosuggest that { (i) µ > 4, (ii) µ = 4 }. Choose the correct pair.

(A) (a,i) (B) (b,i) (C) (c,i) (D) (d,i)

(E) (a,ii) (F) (b,ii) (G) (c,ii) (H) (d,ii)

5. Which of the following are assumptions needed to ensure the validity of the computationsin Question 4?

(i) The 53 observations constitute an observed random sample from the sampled population.

(ii) The population random variable, X = resting time between flashes for a firefly randomlychosen from the sampled population, has a near-normal distribution.

(iii) The estimated standard error = 0.0824.



6. Use these data as a pilot study to find the sample size needed to estimate µ within ± 0.1second with 95% confidence.

(A) 80 (B) 100 (C) 120 (D) 140 (E) 160

(F) 180 (G) 200 (H) 220 (I) 240 (J) 260

Questions 7 through 9 refer to the following setup.One variable used to compare the physical attributes of female Olympic swimmers and runnersis the circumference of the upper arm, in centimetres, while relaxed. The following data fromtwo independent random samples are available:

Population Parameters Sample Size Sample Mean Sample Std. Dev.

Swimmers µ1, σ1 n1 = 6 x1 = 27.3cm s1 = 2.0cm

Runners µ2, σ2 n2 = 9 x1 = 23.5cm s1 = 1.5cm

7. Assuming normally distributed populations with homogeneous variance, estimate the commonvariance using the pooled sample variance.

(A) 1.7 (B) 2.0 (C) 2.3 (D) 2.6 (E) 2.9

(F) 3.2 (G) 3.5 (H) 3.8 (I) 4.1 (J) 4.4


8. Assuming normality, do these data provide substantial evidence that the true mean circum-ference of the upper arm is more than 2 cm larger in swimmers than in runners? Compute(or bracket) the P-value for the appropriate test, and then interpret the result by stating atwhich levels of significance the null hypothesis should be rejected.

(i) α = 0.01 (ii) α = 0.05 (iii) α = 0.10



9. Assuming normality, find the upper limit of a 99% confidence interval for µ1 − µ2.

(A) 5.9 (B) 6.2 (C) 6.5 (D) 6.8 (E) 7.1

(F) 7.4 (G) 7.7 (H) 8.0 (I) 8.3 (J) 8.6

10. The effect of physical training on the triglyceride level was studied by using 4 randomly chosensubjects. The following pre-training and post-training readings (in mg of triglyceride per 100mL of blood) were obtained:

Subject 1 2 3 4

Pre-training level 68 77 97 116

Post-training level 95 90 131 134

Is there evidence that the true mean pre-training triglyceride level is different from the truemean post-training level? Assume the relevant population distribution(s) is/are normal.

(a) Using standard notation, define the population parameter(s) being tested.

(b) Specify the null and alternative hypotheses.

(c) Compute the observed value of the test statistic.

(d) Specify the distribution to be used for computing the P-value, and give the P-valuecalculation within Table accuracy.

(e) State your conclusion, and report the estimated value of the parameter being tested andthe estimated standard error.


ANSWERS FOR THIRD BLOCK SAMPLE TEST

1. 0.6952 (H) 2. 0.1251 (F) 3. 318 (C)

4. P-value = 0.0150 (F) 5. (B) 6. 139 (D)

7. 2.923 (E) 8. 0.025 ¡ P-value ¡ 0.05 (G) 9. 6.51 (C)

10. (a) µD = true mean difference (pre-training triglyceride level minus post-training level)

(b) H0 : µD = 0 vs. H1 : µD = 0

(c) Tobs =d− 0

sd/√n=

−23− 0

4.67= −4.92

(d) P-value = P (T(3) ≤ −4.92 or T(3) ≥ 4.92) = 2P (T(3) ≥ 4.92)∴ 0.01 < P-value < 0.02

(e) There is strong evidence (0.01 < P-value < 0.02) against H0 that µD = 0, which suggeststhat µD = 0. The estimated value of µD is -23 mg/100mL with estimated standard error= 4.67 mg/100mL.

12 SAMPLE FINAL EXAMINATION 1 40

12 Sample Final Examination 1

Instructions

• The Sharp EL-5l0R scientific calculator is allowed. This is the only calculator that isallowed. A Formula List page is provided. NO other aids such as books, notes, or scratchpaper are permitted.


• Questions 31 through 33 are full-answer questions. For each of these questions, write out yoursolution carefully and completely. Marks will be deducted for incomplete or poorly presentedsolutions.

• The multiple-choice questions are worth 2 marks each, questions 31 and 32 are worth 8 markseach, and question 33 is worth 4 marks. The maximum score is 80 marks.

• Note: To conserve paper, no working space is provided on this sample exam. Working spacewill be provided on the real exam.

1. The following data set is an observed random sample of hydrogen sulphide measurements (inparts per million) produced by anaerobic fermentation of sewage after 42 hours at 37◦C.

201 221 218 228 220 227 223 224 202

The coefficient of variation, defined by c.v. = sample standard deviation divided by samplemean, is a dimensionless quantity that measures the amount of variability relative to the valueof the mean.Compute the value of c.v. for these data.

(A) 0.01 (B) 0.03 (C) 0.05 (D) 0.07 (E) 0.09

(F) 0.15 (G) 0.25 (H) 0.35 (I) 0.45 (J) 0.55

2. Peach trees have fuzzy fruits and nectarine trees have smooth fruits. The allele F for fuzzinessis dominant over that for smoothness f . Each type of fruit can be either yellow or white. Theallele Y for yellow is dominant over that for white y. Suppose a Y yFf is crossed with a yyFfFind the probability of obtaining a yellow peach tree.

(A) 0.05 (B) 0.15 (C) 0.25 (D) 0.35 (E) 0.45

(F) 0.55 (G) 0.65 (H) 0.75 (I) 0.85 (J) 0.95


3. The following data show the results of performing a radiologic diagnostic test for coronaryartery disease (CAD) on 200 subjects, of whom 123 were known to actually have the diseaseand 77 were known to not have the disease:

CAD

Test Result Present Absent

Positive 95 25

Negative 28 52

123 77

If possible, use these data approximate the false-positive rate.

(A) Impossible (B) 0.15 (C) 0.18 (D) 0.21 (E) 0.24

(F) 0.27 (G) 0.30 (H) 0.33 (I) 0.36 (J) 0.39

Questions 4 through 6 refer to the following setup.In a controlled clinical trial to determine the efficacy of an experimental drug for treatingmigraine headache, each of 90 patients was treated with two drugs, the experimental drugand a placebo, in random order. Each treatment lasted 12 weeks. At the end of the treatmentperiod, the effect of the drug was classified into one of three categories: completely effective(CE); somewhat effective (SE); or not effective (NE). Each of the 90 patients was cross-classified, yielding the following migraine headache data:

Response

to Placebo

Response to Drug CE SE NE

CE 10 15 20

SE 10 9 11

NE 7 3 5

Suppose one of these 90 patients is chosen at random.

4. What is the probability that the Drug or the Placebo is completely effective on the chosenpatient?

(A) 0.35 (B) 0.40 (C) 0.45 (D) 0.50 (E) 0.55

(F) 0.60 (G) 0.65 (H) 0.70 (I) 0.75 (J) 0.80

5. If the Placebo is not effective on the chosen patient, what is the probability the Drug also isnot effective?

(A) 0.01 (B) 0.05 (C) 0.10 (D) 0.15 (E) 0.20

(F) 0.25 (G) 0.30 (H) 0.35 (I) 0.40 (J) 0.45



(i) The events “Drug somewhat effective on chosen patient” and “Placebo somewhat effec-tive on chosen patient” are independent.

(ii) The events “Placebo completely effective on chosen patient” and “Placebo not effectiveon chosen patient” are independent.

(iii) The events “Drug not effective on chosen patient” and “Placebo not effective on chosenpatient” are mutually exclusive.



7. A physician orders 2 different diagnostic tests to be run independently on the same true-positive patient. The false-positive rate for the first test is 0.12 and the false-positive rate forthe second test is 0.06. The false-negative rate for the first test is 0.16 and the false-negativerate for the second test is 0.l0. What is the probability of obtaining at least one erroneousnegative test result?

(A) 0.13 (B) 0.16 (C) 0.19 (D) 0.22 (E) 0.25

(F) 0.35 (G) 0.45 (H) 0.55 (I) 0.65 (J) 0.75

Questions 8 and 9 refer to the following setup.Cells in sections of damaged tissue being examined under the microscope are graded for theextent of damage by the following scale: 0, undamaged; 1, slightly damaged; 2, moderatelydamaged; 3, extensively damaged. Cells of tissue exposed to 20 minutes of anoxia beforepreparation for microscopic study exhibit the following density, where X is the classificationvalue for tissue damage:

x 0 1 2 3

f(x) 0.10 0.25 0.60 0.05

8. Compute the expected value of X.

(A) 1.2 (B) 1.4 (C) 1.6 (D) 1.8 (E) 2.0

(F) 2.2 (G) 2.4 (H) 2.6 (I) 2.8 (J) 3.0

9. Compute the standard deviation of X.

(A) 0.1 (B) 0.3 (C) 0.5 (D) 0.7 (E) 0.9

(F) 1.1 (G) 1.3 (H) 1.5 (I) 1.7 (J) 1.9


10. Let X denote the number of new AIDS cases diagnosed per day at a certain large metropolitanhospital. Assume that the cumulative distribution function (cdf) for X is:

x 0 1 2 3 4 5

F (x) 0.40 0.65 0.85 0.92 0.97 1

If at least 1 new AIDS case is diagnosed on a certain day, what is the probability that at most3 new AlDS cases are diagnosed on this day?

(A) 0.45 (B) 0.50 (C) 0.55 (D) 0.60 (E) 0.65

(F) 0.70 (G) 0.75 (H) 0.80 (I) 0.85 (J) 0.90

Questions 11 and 12 refer to the following setup.Suppose records of adult deaths show that 20% of the adults were smokers and 80% werenon-smokers. Among adult smokers, 40% of deaths were due to lung cancer; while amongadult non-smokers, only 5% of deaths were due to lung cancer.

11. What proportion of deaths in this population of death records were NOT due to lung cancer?

(A) 0.50 (B) 0.60 (C) 0.70 (D) 0.80 (E) 0.88

(F) 0.90 (G) 0.92 (H) 0.94 (I) 0.96 (J) 0.98

12. If a death record randomly chosen from this population shows lung cancer to be the cause ofdeath, what is the probability that this person was a smoker?

(A) 0.05 (B) 0.15 (C) 0.25 (D) 0.35 (E) 0.45

(F) 0.55 (G) 0.65 (H) 0.75 (I) 0.85 (J) 0.95

13. The germination rate for a certain seed variety is 80%. Suppose 17 seeds of this variety areplanted. If at least 13 of the planted seeds germinate, what is the probability that at least 15of them germinate?

(A) 0.00 (B) 0.05 (C) 0.10 (D) 0.15 (E) 0.20

(F) 0.25 (G) 0.30 (H) 0.35 (I) 0.40 (J) 0.45

14. In a certain culture, bacterial colonies occur at random on an agar plate at the average rateof 10 colonies per 100 cm2. What is the probability that between 3 and 6 (inclusive, i.e. 3,4,5,or 6) such colonies will occur on an agar plate of size 50 cm2?

(A) 0.05 (B) 0.15 (C) 0.25 (D) 0.35 (E) 0.45

(F) 0.55 (G) 0.65 (H) 0.75 (I) 0.85 (J) 0.95


15. In a certain tree population, trees are graded on the following scale: 0, very unhealthy; 1,somewhat unhealthy; 2, healthy. The population distribution is given in the following table:

x 0 1 2

f(x) 0.05 0.25 0.70

Let X be the sample mean score for a random sample of 2 trees in this population. ComputeP (X ≥ 1.5).

(A) 0.05 (B) 0.15 (C) 0.25 (D) 0.35 (E) 0.45

(F) 0.55 (G) 0.65 (H) 0.75 (I) 0.85 (J) 0.95

Questions 16 and 17 refer to the following setup.The serum iron content in a population of subjects is known to be normally distributed withmean 112 mg/mL and standard deviation 5 mg/mL.

16. Find the probability that a randomly chosen subject has serum iron content below 105 mg/mL.

(A) 0.05 (B) 0.08 (C) 0.11 (D) 0.14 (E) 0.17

(F) 0.20 (G) 0.23 (H) 0.26 (I) 0.29 (J) 0.32

17. Find the point x0 such that only 10% of these subjects have serum iron content above x0.

(A) 116.5 (B) 117.0 (C) 117.5 (D) 118.0 (E) 118.5

(F) 119.0 (G) 119.5 (H) 120.0 (I) 120.5 (J) 121.0

18. The diastolic blood pressure of a certain person is a normally distributed random variablewith mean 83mm Hg and standard deviation 14mm Hg. Suppose this person’s blood pressureis taken on 5 different (independent) days. What is the probability that the diastolic bloodpressure reading will be above 90mm Hg on at most 1 of these 5 occasions?

(A) 0.15 (B) 0.20 (C) 0.25 (D) 0.30 (E) 0.35

(F) 0.40 (G) 0.45 (H) 0.50 (I) 0.55 (J) 0.60

Questions 19 through 21 refer to the following setup.When a batch of a certain chemical product is prepared, the amount of a particular impurityin the batch is a random variable with mean value 5.2 grams and standard deviation 1.3 grams.Let X denote the sample mean amount of impurity from a random sample of 50 batches ofthis chemical product.

19. Find the expected value of X.

(A) 4.2 (B) 4.3 (C) 4.5 (D) 4.6 (E) 4.7

(F) 4.8 (G) 4.9 (H) 5.0 (I) 5.1 (J) 5.2

20. Find the variance of X.

(A) 0.05 (B) 0.15 (C) 0.30 (D) 0.60 (E) 0.90

(F) 1.20 (G) 1.50 (H) 1.80 (I) 2.10 (J) 2.40


21. Find the probability that the sample mean X takes a value between 5.1 and 5.4 grams.

(A) 0.05 (B) 0.15 (C) 0.25 (D) 0.35 (E) 0.45

(F) 0.55 (G) 0.65 (H) 0.75 (I) 0.85 (J) 0.95

Questions 22 through 24 refer to the following setup.Well-developed pasture soils should contain indigenous mycorrhizal fungi, which greatly stim-ulate the growth of clover and rye grass. The mean number of spores per gram of soil ingood pasture land is 9.0. In eroded areas, the mycorrhizal infectivity is thought to be highlyreduced. From a random sample of 50 soil samples obtained from several eroded areas, thesample mean number of spores per gram of soil was found to be 2.4 with a sample standarddeviation of 2.1. Let µ denote the true mean number of spores per gram of soil in the sampledareas.

22. Do these data support the contention that µ is below 3.0? Compute the P-value for theappropriate test.

(A) 0.005 (B) 0.01 (C) 0.02 (D) 0.03 (E) 0.04

(F) 0.05 (G) 0.10 (H) 0.15 (I) 0.20 (J) 0.25

23. Compute the lower limit of a 85% confidence interval for µ.

(A) 1.5 (B) 1.6 (C) 1.7 (D) 1.8 (E) 1.9

(F) 2.0 (G) 2.1 (H) 2.2 (I) 2.3 (J) 2.4

24. Use these data as a pilot study to find the sample size n needed to estimate µ to within ±0.5with 99% confidence. (i.e. 99% CI length = 1.0)

(A) 60 (B) 70 (C) 80 (D) 90 (E) 100

(F) 110 (G) 120 (H) 130 (I) 140 (J) 150

Questions 25 through 27 refer to the following setup.A study of the germination rates of two different varieties of rye grass seed reported that 72of 100 Variety A seeds germinated, where as 81 of 100 Variety B seeds germinated. Let p1and p2 denote the true germination rates for Variety A and Variety B, respectively.

25. Do these data provide evidence that the two germination rates differ? Compute the P-valuefor the appropriate test, and then interpret the result by stating at which of the followinglevels of significance the null hypothesis should be rejected.

(i) α = 0.01 (ii) α = 0.05 (iii) α = 0.l0



26. Compute the upper limit of a 95% confidence interval for p1 − p2.

(A) −0.11 (B) −0.09 (C) −0.07 (D) −0.05 (E) −0.03

(F) −0.01 (G) 0.01 (H) 0.03 (I) 0.05 (J) 0.07


27. Use these data as a pilot study to determine the common sample size n (n = n1 = n2) neededto estimate p1 − p2 within ±5 percentage points with 95% confidence (95% CI length = 0.l0).

(A) 150 (B) 250 (C) 350 (D) 450 (E) 550

(F) 650 (G) 750 (H) 850 (I) 950 (J) 1050

Questions 28 through 30 refer to the following setup.The following data constitute an observed random sample of lengths of ears (in centimetres)collected in a study of size inheritance in a hybrid variety of corn:

16.1 15.3 14.4 10.5 18.2 16.7 16.4 15.8 17.1 12.4

28. Is it reasonable to conclude that the mean ear length µ for this variety of corn is larger than 14cm? These data provide { (a) very strong (b) strong (c) moderate (d) little or no } evidenceagainst H0 that µ = 14 which suggests that { (i) µ = 14 (ii) µ > 14 }. Choose the correctpair.

(A) (a,i) (B) (b,i) (C) (b,i) (D) (b,ii)

(E) (c,i) (F) (c,ii) (G) (d,i) (H) (d,ii)

29. Compute the upper limit of a 90% confidence interval for the mean length µ of ears in plantsof this variety.

(A) 16.5 (B) 16.6 (C) 16.7 (D) 16.8 (E) 16.9

(F) 17.0 (G) 17.1 (H) 17.2 (I) 17.3 (J) 17.4


(i) To ensure validity of the above calculations, it is necessary that the population dis-tribution of ear lengths has (at least approximately) a t-distribution with 9 degrees offreedom.

(ii) The estimated value of the mean ear length µ is 15.29 cm.

(iii) The estimated standard error of the sample mean for estimating µ is 2.31cm.




31. To determine whether waste discharge by a chemical plant is polluting the local river, the riverwater was sampled at two locations, one upstream and one downstream from the discharge site.Independent water samples of sizes nl = 5 and n2 = 8 were selected from the upstream anddownstream locations, respectively. The concentration level (ppm) of a suspected chemicalpollutant was determined in each sample, with the following results:

Upstream: 22.5 29.7 20.4 28.5 25.3

Downstream: 24.8 30.4 32.3 26.4 27.8 31.5 34.3 29.0

Let µ1 and µ2 denote the true mean chemical levels in the upstream and downstream locations,respectively, and assume the two sampled populations of pollutant concentration levels areboth normally distributed.

(a) For making inferences about µ1−µ2 based on these data, the most appropriate proceduresto use are:

� Z-procedures,

� Pooled t-procedures,

� Unpooled (Smith-Satterthwaite) t-procedures.

Check the box beside the correct answer.

(b) Find the estimated value of µ1 − µ2.

(c) Find the estimated standard error of x1 − x2 for estimating µ1 − µ2.

(d) Construct a 95% confidence interval for µ1 − µ2.

(e) Find the common sample size n (n = n1 = n2) needed to estimate µ1 − µ2 within ± lppm with 95% confidence (i.e. 95% CI length = 2 ppm).

32. The urinary fluoride concentration (ppm) was determined for 6 randomly chosen livestockboth at the beginning of and in the middle of their grazing period in a region previouslyexposed to fluoride pollution.

Subject

1 2 3 4 5 6

Beginning of Period 24.7 46.1 18.5 29.5 26.3 33.9

Middle of Period 12.4 14.1 9.6 19.5 17.7 10.6

Do these data suggest that there has been a decrease in true average urinary fluoride con-centration of more than 12 ppm during the period under consideration? Assume the relevantpopulation distribution(s) is/are normal.

(a) Using standard notation, define the population parameter(s) being tested.



(d) Specify the distribution (including the degrees of freedom if relevant) to be used forcomputing the P-value, and compute (or bracket) the P-value within Table accuracy.



ANSWERS FOR SAMPLE FINAL EXAMINATION 1

1. 0.0458 (C) 2. 0.375 (D) 3. 0.325 (H)

4. 0.689 (H) 5. 0.139 (D) 6. (B)

7. 0.244 (E) 8. 1.60 (C) 9. 0.735 (D)

10. 0.867 (I) 11. 0.88 (E) 12. 0.667 (G)

13. 0.4083 (I) 14. 0.637 (G) 15. 0.84 (I)

16. 0.0808 (B) 17. 118.4 (E) 18. 0.511 (H)

19. 5.2 (J) 20. 0.0338 (A) 21. 0.5675 (F)

22. 0.0217 (C) 23. 1.97 (F) 24. 117 (G)

25. P-value = 0.1310 (A) 26. 0.0268 (H) 27. 547 (E)

28. 0.05 < P-value < 0.10 (F) 29. 16.63 (B) 30. (C)

31. (a) Pooled t-procedures

(b) −4.28ppm

(c) 1.97 ppm

(d) −4.28± (2.201)(1.97) or (−8.62, 0.06)

(e) 98 (assume large sample sizes therefore unequal variance)

32. (a) µD = true mean difference (u.f.c. at beginning - u.f.c. at middle)

(b) H0 : µD = 12 vs. H1 : µD > 12

(c) Tobs =d− 12

sD/√n=

15.85− 12

3.93= 0.98

(d) P-value = P (T(5) ≥ 0.98) ∴ 0.10 < P-value < 0.25

(e) There is little or no evidence (0.10 < P-value < 0.25) against H0 : µD = 12 whichsuggests that µD > 12. The estimated value of µD is 15.85 ppm, with estimated standarderror = 3.93 ppm.



Instructions

• The Sharp EL-5lOR scientific calculator is allowed. This is the only calculator that isallowed. A Formula List page is provided. NO other aids such as books, notes, or scratchpaper are permitted.



• The multiple-choice questions are worth 2 marks each, and questions 35 and 36 are worth 6marks each. The maximum score is 80 marks.


Questions 1 and 2 refer to the following setup.In a study of diabetics, the following data (listed in increasing order) on age at onset ofdiabetes were obtained:

26.2 30.5 35.5 38.0 39.8 40.1 42.1 51.4 52.2 53.8 55.6 59.3 60.9 65.4

1. Compute the difference x− x (sample mean minus sample median).

(A) −0.9 (B) −0.7 (C) −0.5 (D) −0.3 (E) −0.1

(F) 0.1 (G) 0.3 (H) 0.5 (I) 0.7 (J) 0.9

2. What percentage of the observations are within ±2 sample standard deviations of the samplemean?

(A) 55% (B) 60% (C) 65% (D) 70% (E) 75%

(F) 80% (G) 85% (H) 90% (I) 95% (J) 100%


3. In humans, the allele for normal skin pigmentation S is dominant over that for albinism s.The allele for free earlobes F is dominant over that for attached earlobes f . A woman hasgenotype SsFf , and her husband has genotype SsFf . What is the probability that theiroffspring will have normal skin and free earlobes?

(A) 0.05 (B) 0.15 (C) 0.25 (D) 0.35 (E) 0.45

(F) 0.55 (G) 0.65 (H) 0.75 (I) 0.85 (J) 0.95

Questions 4 through 7 refer to the following setup.A chemist analyses seawater samples for three heavy metals: iron, lead, and mercury. She findsthat 8% of the samples taken from near the mouth of a river on which numerous industrialplants are located contain high levels of all three of these metals, 14% of the samples containhigh levels of iron and lead, 10% of the samples contain high levels of iron and mercury, 45%of the samples contain high levels of lead or mercury, 40% of the samples contain high levelsof iron, 35% of the samples contain high levels of lead, and 25% of the samples contain highlevels of mercury.

4. What is the probability that a randomly chosen sample contains high levels of iron or lead?

(A) 0.50 (B) 0.53 (C) 0.56 (D) 0.59 (E) 0.62

(F) 0.65 (G) 0.68 (H) 0.71 (I) 0.74 (J) 0.77

5. What is the probability that a randomly chosen sample contains a high level of only mercury?

(A) 0.05 (B) 0.10 (C) 0.15 (D) 0.20 (E) 0.25

(F) 0.30 (G) 0.35 (H) 0.40 (I) 0.45 (J) 0.50

6. If a randomly chosen sample contains high levels of lead or mercury, what is the probabilitythat it contains a high level of iron?

(A) 0.05 (B) 0.10 (C) 0.15 (D) 0.20 (E) 0.25

(F) 0.30 (G) 0.35 (H) 0.40 (I) 0.45 (J) 0.50

7. Which of the following three statements are true?

(i) The event that a randomly chosen sample contains a high level of iron is independent ofthe event that it contains a high level of lead.

(ii) The event that a randomly chosen sample contains a high level of iron is independent ofthe event that it contains a high level of mercury.

(iii) The event that a randomly chosen sample contains a high level of iron is independent ofthe event that it contains a high level of only lead.




8. A medical team investigated the relation between immunological factors and survival after aheart attack. Blood specimens from 213 male heart-attack patients were tested for presenceof antibody to milk protein. The patients were followed to determine whether they lived for6 months following their heart attack. The results are given in the following table:

Antibody to Milk Protein

Present Absent

Died Within 6 Months of Heart-Attack 29 10

Alive 6 Months After Heart-Attack 80 94

213

Based on these data, estimate the Relative Risk of death within 6 months of heart-attack forpatients with the antibody to milk protein versus patients without the antibody.

(A) 0.60 (B) 0.80 (C) 1.00 (D) 1.20 (E) 1.60

(F) 2.00 (G) 2.40 (H) 2.80 (I) 3.20 (J) 3.60

Questions 9 and 10 refer to the following setup.In a certain population of the European starling, there are 5000 nests with young. Thedistribution of brood size (number of young in a nest) is given in the accompanying table.

Brood Size 1 2 3 4 5 6 7 8 9 10

Frequency (No. ofBroods) 90 230 610 1400 1760 750 130 26 3 1

Suppose one of the 5000 broods is to be chosen at random, and X be the size of the chosenbrood.

9. Compute the mean value (expected value) of X.

(A) 4.3 (B) 4.4 (C) 4.5 (D) 4.6 (E) 4.7

(F) 4.8 (G) 4.9 (H) 5.0 (I) 5.1 (J) 5.2

10. Compute P (X ≥ 4|X ≤ 5).

(A) 0.53 (B) 0.56 (C) 0.59 (D) 0.62 (E) 0.65

(F) 0.68 (G) 0.71 (H) 0.74 (I) 0.77 (J) 0.80

Questions 11 and 12 refer to the following setup.According to medical data, the mammography screening test for breast cancer has a 10%false-positive rate and a 17% false-negative rate.

11. If a mammography screening test is administered on a woman randomly chosen from theage-group where 5% of the age-group have breast cancer, what is the probability that thiswoman will test positive?

(A) 0.04 (B) 0.07 (C) 0.10 (D) 0.13 (E) 0.16

(F) 0.19 (G) 0.22 (H) 0.25 (I) 0.28 (J) 0.31


12. If a mammography screening test is administered on a woman randomly chosen from theage-group where 5% of the age-group have breast cancer and a negative result (no cancerindicated) is obtained, what is the probability that this woman does not have breast cancer?

(A) 0.81 (B) 0.83 (C) 0.85 (D) 0.87 (E) 0.89

(F) 0.91 (G) 0.93 (H) 0.95 (I) 0.97 (J) 0.99

13. Suppose X and Y are independent such that µX = 50, µY = 16, σX = 10, σY = 2. Computethe quantity: µX−3Y + σX−3Y .

(A) 0 (B) 2 (C) 4 (D) 6 (E) 8

(F) 10 (G) 12 (H) 14 (I) 16 (J) 18

Questions 14 and 15 refer to the following setup.The shell of the land snail Limocolaria martensiana has two colour forms: streaked and pallid.In a certain population of these snails, 60% of the individuals have streaked shells.

14. Suppose that a random sample of 10 snails is to be chosen from this population. Find theprobability that the percentage of streaked-shelled snails in the sample will be 60%.

(A) 0.20 (B) 0.25 (C) 0.30 (D) 0.35 (E) 0.40

(F) 0.45 (G) 0.50 (H) 0.55 (I) 0.60 (J) 0.65

15. Suppose that 5 different random samples of 10 snails each are to be independently chosenfrom this population. What is the probability that exactly one of these 5 samples will consistof 5 streaked-shelled and 5 pallid-shelled snails? (Hint: First find the chance that a randomsample of 10 will contain 5 of each colour form.)

(A) 0.15 (B) 0.20 (C) 0.25 (D) 0.30 (E) 0.35

(F) 0.40 (G) 0.45 (H) 0.50 (I) 0.55 (J) 0.60

Questions 16 and 17 refer to the following setup.In a certain field of barley, powdery mildew spots occur at random on the barley plants atthe average rate of 2 spots per barley plant.

16. If 5 barley plants are chosen at random from the field, what is the probability that the totalnumber of mildew spots on these 5 barley plants will be greater than or equal to 10 spots?

(A) 0.31 (B) 0.34 (C) 0.37 (D) 0.40 (E) 0.43

(F) 0.46 (G) 0.49 (H) 0.52 (I) 0.55 (J) 0.58

17. If 5 barley plants are chosen at random from the field, what is the probability that each ofthe 5 chosen plants will have at least 2 mildew spots? (Hint: First find the chance that arandomly chosen barley plant will have at least 2 mildew spots.)

(A) 0.01 (B) 0.04 (C) 0.07 (D) 0.10 (E) 0.13

(F) 0.16 (G) 0.19 (H) 0.22 (I) 0.25 (J) 0.28


Questions 18 through 20 refer to the following setup.In an experiment to test associations, a biologist places two animals in a rectangular tankmarked into three equal sections. At specified time intervals, the biologist notes the ”distance”X between the animals. This may be zero, if the two animals are in the same section; one,if the two animals are in adjacent sections; or two, if the two animals are in opposite ends ofthe tank. Assuming each animal behaves independently of the other and is equally likely tobe in any section, the probability distribution of X is given in the following table:

x 0 1 2

f(x) 3/9 4/9 2/9

Let X1, X2 be a random sample of population random variable X having sample size n = 2.Then X = (X1 +X2)/2.

18. Compute P (X ≤ 1).

(A) 0.30 (B) 0.35 (C) 0.40 (D) 0.45 (E) 0.50

(F) 0.55 (G) 0.60 (H) 0.65 (I) 0.70 (J) 0.75

19. Compute the variance of X. (Hint: First find the mean and variance of population rv X.)

(A) 0.25 (B) 0.35 (C) 0.45 (D) 0.55 (E) 0.65

(F) 0.75 (G) 0.85 (H) 1.00 (I) 1.15 (J) 1.30

20. Compute P (X1 = X2).

(A) 0.05 (B) 0.10 (C) 0.15 (D) 0.20 (E) 0.25

(F) 0.30 (G) 0.35 (H) 0.40 (I) 0.45 (J) 0.50

Questions 21 and 22 refer to the following setup.In a certain population of the herring Pornolobus aestivalis, the lengths of the individualfish follow a normal distribution. The mean length of the fish is 54 mm, and the standarddeviation is 4 mm.

21. What percentage of the fish are between 51 and 60 mm long?

(A) 45% (B) 50% (C) 55% (D) 60% (E) 65%

(F) 70% (G) 75% (H) 80% (I) 85% (J) 90%

22. The 85th percentile of the fish length distribution is the value x such that 85% of the fishlengths are less than or equal to x and 15% are greater than or equal to x. Find x.

(A) 56.6 (B) 56.8 (C) 57.0 (D) 57.2 (E) 57.4

(F) 57.6 (G) 57.8 (H) 58.0 (I) 58.2 (J) 58.4


23. The basal diameter of a sea anemone is an indicator of its age. In a certain large population ofanemones, the population mean diameter is 4.2 cm, and the standard deviation is 1.4 cm. LetX denote the sample mean diameter of 36 anemones randomly chosen from the population.Find P (4.1 ≤ X).

(A) 0.46 (B) 0.49 (C) 0.52 (D) 0.55 (E) 0.58

(F) 0.61 (G) 0.64 (H) 0.67 (I) 0.70 (J) 0.73

Questions 24 and 25 refer to the following setup.A zoologist measured tail length in 86 individuals, all in the one-year age group of the deer-mouse Peromyscus. The sample mean length was 60.43 mm and the sample standard deviationwas 3.06 mm.

24. Compute the upper limit of an 82% confidence interval for the population mean tail length.

(A) 60.8 (B) 60.9 (C) 61.0 (D) 61.1 (E) 61.2

(F) 61.3 (G) 61.4 (H) 61.5 (I) 61.6 (J) 61.7

25. Which of the following statements are true regarding the 82% confidence interval whose upperlimit was to be computed in Question 24?

(i) This confidence interval contains approximately 82% of the sample.

(ii) This confidence interval contains approximately 82% of the population.

(iii) We are approximately 82% confident that this confidence interval contains the samplemean.



26. Six healthy three-year-old female Suffolk sheep were injected with the antibiotic Gentam-icin, at a dosage of 10 mg/kg body weight. Their blood serum concentrations (µg/mL) ofGentamicin 1.5 hours after injection were as follows.

33 26 34 31 23 25

Assuming these data constitute an observed random sample from a normally distributedpopulation, compute the lower limit of a 95% confidence interval for the population meanblood serum concentration of Gentamicin.

(A) 23.2 (B) 23.4 (C) 23.6 (D) 23.8 (E) 24.0

(F) 24.2 (G) 24.4 (H) 24.6 (I) 24.8 (J) 25.0


Questions 27 and 28 refer to the following setup.In a study on plant growth, a plant physiologist grew 60 individually potted soybean seedlingsof the type called Wells II. She raised the plants in the greenhouse under identical environ-mental conditions (light, temperature, soil, etc.). She measured total stem length (cm) foreach plant after 16 days of growth, obtaining a sample mean stem length of 21.4 cm and asample standard deviation of 3.7 cm.

27. Do these data provide substantial support for the hypothesis that the true mean stem lengthµ of Wells II soybean plants grown under the specified conditions is larger than 20 cm? Thesedata provide { (a) very strong (b) strong (c) moderate (d) little or no } evidence against H0

which suggests that { (i) µ = 20 (ii) µ > 20}. Choose the correct pair.

(A) (a,i) (B) (b,i) (C) (c,i) (D) (d,i)


28. Which of the following statements are assumptions needed to ensure validity of the con-clusion reached in Question 27?

(i) The 60 stem length measurements approximate a normally distributed population.

(ii) The 60 stem length measurements constitute an observed random sample.

(iii) The sampled population is approximately normally distributed.



29. The permissible exposure to benzene in the oil refining industry is one part per million (ppm).An industrial hygienist at a specific oil company measured exposure levels of 6 workers to assesscompliance to this standard. The 6 observations are:

Observation 1 2 3 4 5 6

Benzene (ppm) 0.6 0.9 0.5 0.8 1.1 1.4

Using these data as a pilot study, approximately how many additional observations willbe needed to estimate the true mean benzene exposure level within ±0.1 ppm with 95%confidence?

(A) 31 (B) 33 (C) 35 (D) 37 (E) 39

(F) 41 (G) 45 (H) 50 (I) 60 (J) 70

30. In an ecological study of the Carolina Junco, 50 birds were captured from a certain population;of these, 33 were male. Is this evidence that males outnumber females in the population?Compute the P-value for the appropriate test, and then interpret the result by stating atwhich levels of significance the null hypothesis should be rejected.

(i) α = 0.01 (ii) α = 0.05 (iii) α = 0.10




31. Researchers studied the effect of a houseplant fertilizer on radish sprout growth. They ran-domly selected 50 radish seeds to serve as controls, while 50 others were planted in aluminumplanters to which fertilizer sticks were added. Other conditions were held constant betweenthe two groups. For the control group, the sample mean height two weeks after germinationwas 2.58 cm with sample standard deviation 0.65 cm. For the fertilized group, the samplemean height two weeks after germination was 2.04 cm with sample standard deviation 0.72cm. Compute the upper limit of a 98% confidence interval for the true mean height difference(control group minus fertilized group).

(A) 0.62 (B) 0.65 (C) 0.68 (D) 0.71 (E) 0.74

(F) 0.77 (G) 0.80 (H) 0.83 (I) 0.86 (J) 0.89

32. In studying the relationship between smoking and low birthweight, the conditional proba-bilities of interest are: p1 = P (Low birthweight | Smoker) and p2 = P (Low birthweight |Nonsmoker). A pilot study is conducted by choosing a group of 100 smokers and a group of100 nonsmokers, and then observing the birthweights of their infants. The data obtained aregiven in the following table.

Smoking Status of Mother

Smoker Non-Smoker

Low Birthweight 20 10

Normal Birthweight 80 90

Based on these preliminary data, approximately how large should be the common sample sizen = nl = n2 in order to estimate pl − p2 within ±4 percentage points with 90% confidence?

(A) 300 (B) 325 (C) 350 (D) 375 (E) 400

(F) 425 (G) 450 (H) 475 (I) 500 (J) 600


33. Two varieties of lettuce were grown for 16 days in a controlled environment. The followingtable shows the total dry weight (in grams) of leaves of 9 plants of the variety “Salad Bowl”and 6 plants of the variety “Bibb.”

Salad Bowl 3.06 2.78 2.87 3.52 3.81 3.60 3.30 2.77 3.62

Bibb 1.31 1.17 1.72 1.20 1.55 1.53

(a) Construct a 95% confidence interval for µ1 − µ2, where µ1 denotes the true mean totaldry weight of the leaves of the variety “Salad Bowl” and µ2 denotes the true mean totaldry weight of the leaves of the variety “Bibb.”

(b) What assumptions underlie the procedure used to construct the above confidence inter-val?

34. For each of 6 horses, a veterinary anatomist measured the density of nerve cells at specifiedsites in the intestine. The results for site I (mid-region of jejunum) and site II (mesentericregion of jejunum) are given in the following table.

Animal 1 2 3 4 5 6

Site I 50.6 39.2 35.2 17.0 11.2 14.2

Site II 38.0 18.6 23.2 19.0 6.6 16.4

Difference 12.6 20.6 12.0 −2.0 4.6 −2.2

Do these data suggest that the true mean density of nerve cells µ1 at site I differs from thetrue mean density of nerve cells µ2 at site II ? Assume the relevant population distribution(s)is/are near normal.

(a) Specify the null and alternative hypotheses.

(b) Compute the observed value of the test statistic.

(c) Specify the distribution (including the degrees of freedom if relevant) to be used forcomputing the P-value, and compute (or bracket) the P-value within Table accuracy.

(d) State your conclusion, and report the estimated value of the parameter(s) being testedand the estimated standard error.



1. −0.264 (D) 2. (J) 3. 0.5625 (F) 4. 0.61 (E)

5. 0.08 (B) 6. 0.356 (G) 7. (E) 8. 2.767 (H)

9. 4.487 (C) 10. 0.7726 (I) 11. 0.1365 (D) 12. 0.990 (J)

13. 13.66 (H) 14. 0.2508 (B) 15. 0.4096 (F) 16. 0.542 (I)

17. 0.074 (C) 18. 0.753 (J) 19. 0.2716 (A) 20. 0.358 (G)

21. 0.7066 (F) 22. 58.16 (I) 23. 0.6664 (H) 24. 60.87 (B)

25. (A) 26. 23.85 (D) 27. P-value = 0.0017 (E) 28. (C)

29. 37 (D) 30. P-value = 0.0119 (G) 31. 0.8596 (I) 32. 423 (F)

1. (a) n1 = 9, x1 = 3.259, s1 = 0.400 n2 = 6, x2 = 1.413, s2 = 0.220

s21s22

=0.16

0.0484= 3.31 > 2, so use unpooled-t. Degress of Freedom = 12.

95% CI for µ1 − µ2 : 1.85± (2.179)(0.16) = 1.85± 0.35 or (1.50, 2.20)

(b) i. independent random samples

ii. normal (or near-normal) population distributions

iii. σ21 = σ2

2

2. (a) H0 : µD = 0 vs. H1 : µD = 0or H0 : µ1 − µ2 = 0 vs. H1 : µ1 − µ2 = 0

(b) Tobs =7.6− 0

9.06/√6=

7.6− 0

3.7= 2.054

(c) P-value = P (T(5) ≥ 2.054 or T(5) ≤ −2.054) = 2P (T(5) ≥ 2.054) ∴ 0.05 < P-value < 0.10

(d) There is moderate evidence (0.05 < P-value < 0.10) against H0 which suggests thatµ1 − µ2 = 0. The estimated value of µD = µ1 − µ2 is 7.6 with estimated standard error= 3.7.



Instructions

• The Sharp EL-5lOR scientific calculator is allowed. This is the only calculator that isallowed. A Formula List page is provided. NO other aids such as books, notes, or scratchpaper are permitted.


• Questions 33 through 35 are full-answer questions. For each of these questions, write out yoursolution carefully and completely. Marks will be deducted for incomplete or poorly presentedsolutions.

• The multiple-choice questions are worth 2 marks each, questions 33 and 34 are worth 6 markseach, and question 35 is worth 4 marks. The maximum score is 80 marks.


Questions 1 and 2 refer to the following setup.In a study of the lizard Sceloporus occidentalis, biologists measured the distance (m) run intwo minutes for each of 15 animals. The results (listed in increasing order) were as follows:

18.4 22.2 24.5 26.4 27.5 28.7 30.6 32.9 32.9 34.0 34.8 37.5 42.145.5 45.5

1. 1. Compute the difference x− x (sample mean minus sample median).

(A) −0.9 (B) −0.7 (C) −0.5 (D) −0.3 (E) −0.1

(F) 0.1 (G) 0.3 (H) 0.5 (I) 0.7 (J) 0.9

2. What percentage of the observations are within ±1 sample standard deviation of the samplemean?

(A) 50% (B) 55% (C) 60% (D) 65% (E) 70%

(F) 75% (G) 80% (H) 85% (I) 90% (J) 95%


3. In guinea pigs, short hair (S) is dominant to long hair (s) and black fur (B) is dominant toalbino fur (b). A female which is SsBb is mated to a male that is Ssbb. What is the probabilitythat an offspring from this mating will be black with short hair?

(A) 0.05 (B) 0.15 (C) 0.25 (D) 0.35 (E) 0.45

(F) 0.55 (G) 0.65 (H) 0.75 (I) 0.85 (J) 0.95

4. A large population of the fruitfly Drosophila melanogaster is maintained in a lab. In the pop-ulation, 30% of the individuals are black because of a mutation, while 70% of the individualshave the normal gray body colour. Suppose a simple random sample of two flies is chosenfrom this population. What is the probability that one black fly and one gray fly are chosen?

(A) 0.05 (B) 0.10 (C) 0.15 (D) 0.20 (E) 0.25

(F) 0.30 (G) 0.35 (H) 0.40 (I) 0.45 (J) 0.50

Questions 5 through 8 refer to the following setup.The table below shows the relationship between hair colour and eye colour for a group of 1770German men.

Hair Colour

Brown Black Red

Eye Brown 400 300 20

Colour Blue 800 200 50

1770

Suppose we choose someone at random from this group.

5. What is the probability that the chosen person has black hair or blue eyes?

(A) 0.50 (B) 0.55 (C) 0.60 (D) 0.65 (E) 0.70

(F) 0.75 (G) 0.80 (H) 0.85 (I) 0.90 (J) 0.95

6. What is the probability that the chosen person has brown eyes and brown or black hair?

(A) 0.05 (B) 0.15 (C) 0.25 (D) 0.35 (E) 0.45

(F) 0.55 (G) 0.65 (H) 0.75 (I) 0.85 (J) 0.95

7. If the chosen person has blue eyes, what is the probability he has red hair?

(A) 0.01 (B) 0.02 (C) 0.03 (D) 0.04 (E) 0.05

(F) 0.06 (G) 0.07 (H) 0.08 (I) 0.09 (J) 0.10


8. Which of the following three statements are true?

(i) The events “chosen man has blue eyes” and “chosen man has brown eyes” are indepen-dent.

(ii) The events “chosen man has brown hair” and “chosen man has brown eyes” are mutuallyexclusive.

(iii) The events “chosen man has brown eyes” and “chosen man has black hair” are indepen-dent.



9. Recall: If C denotes the event that a certain condition is present, and E denotes the eventthat a certain risk factor is present, then the ratio P (C|E)/P (C|E ′) is called the relative risk.In a study of the effects of smoking, 9793 pregnant women were asked about their smokinghabits. The table below shows the incidence of low birthweight (2500g or less) among theirinfants.

Smoking Status

Smoker Non-smoker

Low 237 197

Birthweight

Normal 3489 5870

¿From these data we estimate that the risk of having a low birthweight baby is about RRtimes as great for smokers as for nonsmokers. Estimate the relative risk RR.

(A) 1.35 (B) 1.45 (C) 1.55 (D) 1.65 (E) 1.75

(F) 1.85 (G) 1.95 (H) 2.05 (I) 2.15 (J) 2.25

Questions 10 and 11 refer to the following setup.In a certain population of the fireshwater sculpin Cottus rotheus, the distribution of the numberof tail vertebrae, X, is given in the table below.

x 20 21 22 23

P (X = x) 0.03 0.51 0.40 0.06

10. Compute the mean value (expected value) of the number of tail vertebrae X.

(A) 20.6 (B) 20.7 (C) 20.8 (D) 20.9 (E) 21.0

(F) 21.5 (G) 21.6 (H) 21.7 (I) 21.8 (J) 21.9

11. Compute1− F (22)

1− F (21), where F is the cumulative distribution function of X.

(A) 0.10 (B) 0.13 (C) 0.16 (D) 0.19 (E) 0.22

(F) 0.25 (G) 0.50 (H) 0.75 (I) 1.00 (J) 1.25


Questions 12 and 13 refer to the following setup.Consider two screening tests for prostate cancer, the Digital Rectal Exam (DRE) and theProstate Specific Antigen (PSA) test. Assume the DRE test has a 50% false-positive rateand a 45% false-negative rate, while the PSA test has a 35% false-positive rate and a 40%false-negative rate.

12. If both tests are administered (independently) to a man chosen at random from the age-groupwhere 25% of the age-group has prostate cancer, what is the probability that at least one ofthese two test results will be positive?

(A) 0.30 (B) 0.35 (C) 0.40 (D) 0.45 (E) 0.50

(F) 0.55 (G) 0.60 (H) 0.65 (I) 0.70 (J) 0.75

13. If both tests are administered (independently) to a man chosen at random from the age-groupwhere 25% of the age-group has prostate cancer and both test results are positive, what isthe probability that the tested man has prostate cancer?

(A) 0.30 (B) 0.35 (C) 0.40 (D) 0.45 (E) 0.50

(F) 0.55 (G) 0.60 (H) 0.65 (I) 0.70 (J) 0.75

14. Suppose X and Y are independent such that µX = 10, µY = 8, σ2X = 25, and σ2

Y = 16.Compute the ratio: V ar(2X − Y )/E(2X − Y ).

(A) 3 (B) 6 (C) 9 (D) 12 (E) 15

(F) 18 (G) 21 (H) 24 (I) 27 (J) 30

Questions 15 and 16 refer to the following setup.In Canada, 85% of the population has Rh positive blood. Suppose a random sample of 6people are chosen from this population. Let X denote the number of persons, out of 6, withRh positive blood.

15. Find the probability that at least 5 of the 6 chosen people have Rh positive blood.

(A) 0.50 (B) 0.55 (C) 0.60 (D) 0.65 (E) 0.70

(F) 0.75 (G) 0.80 (H) 0.85 (I) 0.90 (J) 0.95

16. Find the standard deviation of X.

(A) 0.45 (B) 0.55 (C) 0.65 (D) 0.75 (E) 0.85

(F) 1.0 (G) 1.5 (H) 2.0 (I) 2.5 (J) 3.0

Questions 17 and 18 refer to the following setup.Heart attack emergencies at a local hospital occur at random times at the average rate of 3per week.

17. What is the probability that at most 8 heart attack emergencies will occur at this hospitalduring the next two-week period?

(A) 0.50 (B) 0.55 (C) 0.60 (D) 0.65 (E) 0.70

(F) 0.75 (G) 0.80 (H) 0.85 (I) 0.90 (J) 0.95


18. What is the probability that there will be no heart attack emergencies at this hospital dur-ing exactly 2 of the next 3 days? (Hint: First compute the probability of no heart attackemergencies during a randomly chosen day.)

(A) 0.30 (B) 0.35 (C) 0.40 (D) 0.45 (E) 0.50

(F) 0.55 (G) 0.60 (H) 0.65 (I) 0.70 (J) 0.75

Questions 19 and 20 refer to the following setup.Height of one-year-old Douglas Fir seedlings is measured on the following discrete scale: 1(small), 2 (average), 3 (tall). The height distribution of a certain population of one-year-oldDouglas Fir seedlings is given in the following table:

x 1 2 3

f(x) 0.35 0.45 0.20

Let X1, X2 be a random sample of size n = 2, where Xi denotes the height of the ith tree(measured on the discrete scale). Then X = (X1 +X2)/2.

19. Compute P (X ≤ 1.5).

(A) 0.05 (B) 0.10 (C) 0.15 (D) 0.20 (E) 0.25

(F) 0.30 (G) 0.35 (H) 0.40 (I) 0.45 (J) 0.50

20. Compute the standard deviation (standard error) of X.

(A) 0.25 (B) 0.50 (C) 0.75 (D) 1.00 (E) 1.25

(F) 1.50 (G) 1.75 (H) 2.00 (I) 2.25 (J) 2.50

Questions 21 and 22 refer to the following setup.The shell thickness of the eggs produced by a certain large flock of White Leghorn hens isnormally distributed with mean µ = .38 mm and standard deviation σ = 0.03 mm.

21. What percentage of eggs produced by this flock have shell thickness less than 0.39 mm?

(A) 51% (B) 53% (C) 55% (D) 57% (E) 59%

(F) 61% (G) 63% (H) 65% (I) 67% (J) 69%

22. Fourteen percent of eggs produced by this flock have shell thickness less than x mm. Find x.

(A) 0.3350 (B) 0.3375 (C) 0.3400 (D) 0.3425 (E) 0.3450

(F) 0.3475 (G) 0.3500 (H) 0.3700 (I) 0.3900 (J) 0.4100

23. The partial pressure of oxygen, PaO2, is a measure of the amount of oxygen in the blood.Assume that the distribution of PaO2 levels among newborns has mean µ = 38 mm Hg andstandard deviation σ = 9 mm Hg. If we take a random sample of size n = 36, what is theprobability that the sample mean will be greater than 40 mm Hg?

(A) 0.05 (B) 0.06 (C) 0.07 (D) 0.08 (E) 0.09

(F) 0.10 (G) 0.20 (H) 0.30 (I) 0.40 (J) 0.50


Questions 24 and 25 refer to the following setup.As part of a study of the development of the thymus gland, researchers weighed the glands offive chick embryos after 14 days of incubation. The thymus weights (mg) were as follows:

29.6 21.5 28.0 34.6 44.9

24. Compute the upper limit of a 90% confidence interval for the population mean.

(A) 38.5 (B) 39.0 (C) 39.5 (D) 40.0 (E) 40.5

(F) 41.0 (G) 41.5 (H) 42.0 (I) 42.5 (J) 43.0


(i) The estimated standard error of x for estimating the population mean is 8.73 mg.

(ii) The sample standard deviation tends to go down as the sample size goes up.

(iii) Using the same data set, a 65% confidence interval for the population mean would belonger than a 75% confidence interval for the population mean.



26. As part of a study of the treatment of anaemia in cattle, researchers measured the concentra-tion of selenium in the blood of 60 cows that had been given a dietary supplement of selenium(2 mg/day) for one year. The cows were all the same breed (Santa Gertrudis) and had bornetheir first calf during the year. The sample mean selenium concentration was 6.21 µg/dL andthe sample standard deviation was 1.84 µg/dL. Construct the lower limit of a 77% confidenceinterval for the population mean.

(A) 5.72 (B) 5.75 (C) 5.78 (D) 5.81 (E) 5.84

(F) 5.87 (G) 5.90 (H) 5.93 (I) 5.96 (J) 5.99

Questions 27 and 28 refer to the following setup.A zoologist measured tail length in 6 individuals, all in the 1-year age group, of the deermouse(Peromyscus). The sample mean length was 60.43 mm and the sample standard deviationwas 3.06 mm. Assume the population distribution is near normal.

27. Do these data contradict the hypothesized mean tail length of µ = 65 mm? These dataprovide { (a) very strong (b) strong (c) moderate (d) little or no } evidence against H0 whichsuggests that { (i) µ < 65 (ii) µ = 65}. Choose the correct pair.

(A) (a,i) (B) (b,i) (C) (c,i) (D) (d,i)


28. Use these data as a pilot study to find the sample size needed to estimate µ within ±0.5 mmwith 99% confidence.

(A) 70 (B) 90 (C) 110 (D) 130 (E) 150

(F) 170 (G) 190 (H) 210 (I) 230 (J) 250


Questions 29 through 31 refer to the following setup.Angina pectoris is a chronic heart condition in which the sufferer has periodic attacks of chestpain. In a study to evaluate the effectiveness of the drug Timolol in preventing angina attacks,patients were randomly allocated to receive a daily dosage of either Timolol or placebo for 28weeks. The numbers of patients who became completely free of angina attacks are shown inthe table below.

Angina Free Not Angina Free Sample Sizes

Timolol 48 112 160

Placebo 21 119 140

Let p1 = true proportion of angina pectoris patients who will become angina free if givenTimolol, and let p2 = true proportion of angina pectoris patients who will become angina freeif given a placebo.

29. Is there any evidence that p1 is more than 10 percentage points larger than p2 (i.e. p1 >p2 + 0.10)? Compute the P-value for the appropriate test, and then interpret the result bystating at which levels of significance the null hypothesis should be rejected.

(i) α = 0.01 (ii) α = 0.05 (iii) α = 0.10



30. Construct the upper limit of a 95% confidence interval for p1 − p2.

(A) 0.18 (B) 0.19 (C) 0.20 (D) 0.21 (E) 0.22

(F) 0.23 (G) 0.24 (H) 0.25 (I) 0.26 (J) 0.27

31. Use these data as a pilot study to determine the common sample size n (n = nl = n2) neededto estimate pl − p2 to within ± 7 percentage points with 90% confidence.

(A) 110 (B) 130 (C) 150 (D) 170 (E) 190

(F) 240 (G) 300 (H) 360 (I) 420 (J) 480

32. In a study of larval development in the tufted apple budmoth (Platynota idaeusalis), anentomologist measured the head widths of 50 larvae. All 50 larvae had been reared underidentical conditions and had moulted six times. The sample mean head width was 1.20 mmand the sample standard deviation was 0.14 mm. Do these data provide substantial evidencethat the true mean head width µ for the sampled population of tufted apple budmoth larvaeis smaller than 1.24 mm? Compute the P-value for the appropriate test.

(A) 0.01 (B) 0.02 (C) 0.03 (D) 0.04 (E) 0.05

(F) 0.10 (G) 0.15 (H) 0.20 (I) 0.25 (J) 0.30


33. In a study to determine whether regular exercise could reduce triglyceride levels, researchersmeasured the triglycerides in the blood serum (mmol/L) of seven male volunteers, before andafter participation in a 10-week exercise program. The results are shown in the table below.

Participant 1 2 3 4 5 6 7

Triglyceride Level Before 0.87 1.13 3.14 2.14 2.98 1.18 1.60

Triglyceride Level After 0.57 1.03 1.47 1.43 1.20 1.09 1.51

(a) Using standard notation, carefully define the population parameter at the centre ofthis study.

(b) Construct a 98% confidence interval for the parameter defined in part (a).

(c) State the assumptions underlying the statistical procedure carried out in part (b).

34. Prothrombin time is a measure of the clotting ability of blood. For ten rats treated with anantibiotic and ten control rats, the prothrombin times (in seconds) were reported as follows:

Sample Size Sample Mean Sample Std. Dev.

Antibiotic 10 25 10

Control 10 23 7

Do these data provide evidence that the true mean prothrombin time is different for thepopulation of rats treated with an antibiotic than for the population of control rats? Assumethe two population distributions are near normal.

(a) Using standard notation, define the population parameters being compared.



(d) Specify the distribution (including the degrees of freedom if relevant) to be used forcomputing the P-value, and compute (or bracket) the P-value within Table accuracy.




1. −0.667 (B) 2. 66.7% (D) 3. 0.375 (D) 4. 0.42 (H)

5. 0.763 (F) 6. 0.3955 (D) 7. 0.0476 (E) 8. (A)

9. 1.959 (G) 10. 21.49 (F) 11. 0.1304 (B) 12. 0.71125 (I)

13. 0.386 (C) 14. 9.667 (C) 15. 0.7765 (G) 16. 0.875 (E)

17. 0.847 (H) 18. 0.444 (D) 19. 0.4375 (I) 20. 0.5136 (B)

21. 0.6293 (G) 22. 0.3476 (F) 23. 0.0918 (E) 24. 40.04 (D)

25. (A) 26. 5.925 (H) 27. 0.01 < P-value < 0.02 (F) 28. 249 (J)

29. P-value = 0.1446 (A) 30. 0.2424 (G) 31. 187 (E) 32. 0.0217 (B)

33. (a) µD = true mean difference (triglyceride level before minus triglyceride level after)

(b) (−.213, 1.567)

(c) The 7 difference measurements constitute an observed random sample taken from a nearnormal population distribution.

34. (a) µ1 = true mean prothrombin time for rats treated with antibiotic.µ2 = true mean prothrombin time for control rats.

(b) H0 : µ1 − µ2 = 0 vs. H1 : µ1 − µ2 = 0

(c) Tobs = 0.52. Since s1/s2 = 1.43 > 1.4, use unpooled t-procedures.

(d) P-value = P (T(16) ≥ 0.52 or T(16) ≤ −0.52) = 2P (T(16) ≥ .52) ∴ 0.50 < P-value < 0.80

(e) There is no evidence (0.50 < P-value < 0.80) against H0. The estimated value of µ1−µ2

is 2s, with estimated standard error = 3.86s.

15 EXERCISES 68

15 Exercises

Exercises on Central Limit Theorem and Large Sample Confidence Intervals

1. Suppose the mean value and standard deviation of interpupillary distance for all adult malesare 65 mm and 5 mm, respectively. What is the probability that the sample mean interpupil-lary distance for a random sample of 100 adult males will fall between 64.2 and 65.8?

2. A random sample of 49 four-year-old Red Pine trees was selected, and the diameter of eachtree’s main stem was measured. The sample mean diameter was found to be 14.64 cm andthe sample standard deviation was 2.85 cm. Find a 95% confidence interval for the true meandiameter of four-year-old Red Pine trees in the sampled population.

3. Suppose the moisture content per kilogram of a dehydrated protein concentrate has mean 8mg and standard deviation 2.4 mg. A random sample of 50 specimens, each consisting of onekilogram of this concentrate, is to be tested. Find the probability that the sample mean ofthese 50 measurements will be below 8.2 mg.

4. In an epidemiological study, the total organochlorines and PCB’s present in milk samples wererecorded from a random sample of 42 donors in a certain region. The sample mean amount oforganochlorines and PCB’s in milk was found to be 133.9 and the sample standard deviationwas 100.4. Construct a 99% confidence interval for the true mean amount of organochlorinesand PCB’s in milk produced in the sampled region.

5. Pears in a certain orchard have mean weight 140 grams and standard deviation 35 grams.What is the probability that the total weight of a random sample of 100 pears from thisorchard will be more than 14.2 kilograms?

6. In a study on the nutritional qualities of fast foods, the amount of fat was measured for arandom sample of 46 hamburgers from a particular restaurant chain. The sample mean andsample standard deviation were found to be 45.7 grams and 4.2 grams, respectively. Computea 90% confidence interval for the mean fat content in hamburgers served in these restaurants.

Exercises on Large Sample Tests of Hypotheses

7. Researchers measured pulmonary compliance for each of 46 construction workers who hadbeen exposed over a long period to asbestos. The sample mean pulmonary compliance wasfound to be 206 and the sample standard deviation was 31. Do these data provide substantialevidence that the true mean pulmonary compliance for such workers is lower than 220?

8. In a study of the cholesterol synthesis rate (CSR) of diabetic patients on a potato-rich diet,a random sample of 41 patients yielded a sample mean CSR of 2.5 mmol/day and a samplestandard deviation of 1.7 mmol/day. Do these data suggest that the true mean CSR fordiabetic patients on a potato-rich diet is larger than 2?

9. Analysis of the venom of 50 eight-day-old worker bees yielded a sample mean histamine contentof 639 nanograms and a sample standard deviation of 202 nanograms. Do these results refutethe claim that the true mean histamine content of the venom of eight-day-old worker bees is600 nanograms?

15 EXERCISES 69

10. An investigation of the use of coal dust for frost protection reported a sample mean soil heatflux of 30.9 and a sample standard deviation of 6.5 based on a random sample of 50 plotscovered with coal dust. The mean soil heat flux for plots covered only with grass is 29.0. Dothese data suggest that the coal dust is effective in increasing the mean heat flux over that ofgrass?

11. In a sample of 42 adolescents who served as the subjects in an immunologic study, one variableof interest was the diameter of skin test reaction to an antigen. The sample mean and samplestandard deviation were 21 and 12 mm erythema, respectively. Can it be concluded fromthese data that the mean diameter of erythema caused by this antigen is less than 24 mm?

12. Each species of firefly has a unique flashing pattern. One species has a pattern that consists ofone short pulse followed by a resting period thought to have an average length of approximately4 seconds. Data on the resting time between flashes for a random sample of 50 fireflies of thisspecies yielded a sample mean resting time of 3.88 seconds and a sample standard deviation of0.46 seconds. Does the evidence contradict the hypothesized mean resting time of 4 seconds?

Exercises on Large Independent Samples Inferences on µ1 − µ2

13. In a study of iron deficiency among infants, one group contained breast-fed infants, while thechildren in the other group were fed a standard baby formula without any iron supplements.Here are summary results on blood hemoglobin levels at 12 months of age.

Group sample size sample mean sample std dev

Breast-fed 45 14.7 5.1

Formula 45 12.4 3.4

(a) Is there substantial evidence that the mean hemoglobin level for breast-fed babies ishigher than the mean hemoglobin level for formula-fed babies?

(b) Compute a 90% C.I. for µ1 − µ2, where µ1 = true mean hemoglobin level for breast-fedbabies, and µ2 = true mean hemoglobin level for formula-fed babies.

14. Ninety-three overweight patients were randomly divided into two groups. Group 1 was puton a program of dieting with exercise. Group 2 dieted only. The results for weight loss, inkilograms, after 2 months are summarized in the following table:

Group 1 Group 2

Sample Size 47 46

Sample Mean 8.4 7.3

Sample Std. Dev. 2.8 2.1

(a) Is there any evidence that the true mean weight losses differ between the two treatments?

(b) Compute a 95% C.I. for µ1 − µ2.

15 EXERCISES 70

15. Leucocyte (white blood cell) counts in thoroughbred horses have been studied as a possible aidto the diagnosis of respiratory viral infections. The accompanying data on neutrophils (themost numerous kind of leucocyte) was reported in a comparative study of counts in healthyhorses of different ages.

Age sample size sample mean sample std dev

2-year-olds 43 58 12.9

4-year-olds 49 54 10.1

(a) Is there significant evidence that true average neutrophil count for healthy 2-year-olds ishigher than that for healthy 4-year-olds?

(b) Construct a 99% confidence interval for µ1−µ2, where µ1 and µ2 denote the true averageneutrophil counts for 2- and 4-year-old horses, respectively.

15 EXERCISES 71

Answers for Exercises

1. 0.8904 2. (13.84, 15.44) 3. 0.7224

4. (94.0, 173.8) 5. 0.2843 6. (44.68, 46.72)

7 µ = true mean pulmonary compliance for such workers, H0 : µ = 220 vs H1 : µ < 220,Zobs = −3.06, P-value = 0.0011, very strong evidence against H0 : µ = 220, estimated valueof µ is 206 with estimated standard error = 4.57

8 µ = true mean CSR for diabetic patients on a potato-rich diet (in mmol/day), H0 : µ = 2vs H1 : µ > 2, Zobs = 1.88, P-value = 0.0301, strong evidence against H0 : µ = 2, estimatedvalue of µ is 2.5 mmol/day with estimated standard error = 0.2655 mmol/day

9 µ = true mean histamine content of the venom of 8-day-old worker bees (in ng), H0 : µ = 600vs H1 : µ = 600, Zobs = 1.37, P-value = 0.1706, little or no evidence against the nullhypothesis, estimated value of µ is 639 ng with estimated standard error = 28.57 ng

10 µ = true mean soil heat flux using coal dust, H0 : µ = 29 vs H1 : µ > 29, Zobs = 2.07, P-value= 0.0192, strong evidence against H0 : µ = 29, estimated value of µ is 30.9 with estimatedstandard error = 0.92

11 µ = true mean diameter of erythema caused by this antigen (in mm), H0 : µ = 24 vsH1 : µ < 24, Zobs = −1.62, P-value = 0.0526, moderate evidence against H0 : µ = 24,estimated value of µ is 21 mm with estimated standard error = 1.85 mm

12 µ = true mean resting time for this species of firefly (in sec), H0 : µ = 4 vs H1 : µ = 4,Zobs = −1.84, P-value = 0.0658, moderate evidence against H0 : µ = 4, estimated value of µis 3.88 sec with estimated standard error = 0.065 sec

13 (a) µ1 = true mean hemoglobin level for breast-fed babies, µ2 = true mean hemoglobin levelfor formula-fed babies, H0 : µ1 − µ2 = 0 vs H1 : µ1 − µ2 > 0, Zobs = 2.52, P-value =0.0059, very strong evidence against H0 : µ1 − µ2 = 0, estimated value of µ1 − µ2 is 2.3with estimated standard error = 0.914

(b) (0.80,3.80)

14 (a) µ1 = true mean weight loss under diet and exercise (in kg), µ2 = true mean weight lossunder diet only (in kg), H0 : µl − µ2 = 0 vs H1 : µ1 − µ2 = 0, Zobs = 2.15, P-value =0.0316, strong evidence against H0 : µ1 − µ2 = 0, estimated value of µ1 − µ2 is 1.1 kgwith estimated standard error = 0.5125 kg

(b) (0.l0, 2.10)

15 (a) µ1 = true mean neutrophil count for healthy 2-year-olds, µ2 = true mean neutrophilcount for healthy 4-year-olds, H0 : µ1 − µ2 = 0 vs H1 : µl − µ2 > 0, Zobs = 1.64, P-value= 0.0505, moderate evidence against H0 : µ1 − µ2 = 0, estimated value of µ1 − µ2 is 4with estimated standard error = 2.44

(b) (−2.28, 10.28)

16 MINITAB ASSIGNMENTS 72

16 Minitab Assignments

16.1 Minitab Assignment 1

In a certain rat study the following measurements of weight (in grams) have been recorded for acommon strain of forty 31-day old rats.

119 124 109 117 131 135 116 110 97 118

129 135 136 112 106 107 123 115 119 107

118 107 119 141 118 125 114 124 102 106

118 124 115 117 127 109 122 115 118 125

Use Minitab to construct a Density Histogram for these data. Follow the guidelines below.

Open Minitab I enter the data into rows 1 through 40 of Column C1, and in the empty boxjust below the C1 label name this column Weight (in grams)I Graph I Histogram... I Simple I Graph variables: C1 I Scale... I Y-Scale Type� Density I OK I Labels... I Title: Rat Study I OK I OK

With the graph in the active window, click: Edit I Copy Graph

Now you can open a word processor (such as MS Word or Wordperfect) and paste the graph intoa document.

Your submission should be one page (no title page) with Name Block in the upper left-hand corner.The Name Block should consist of your last name underlined, followed by a comma and your firstname not underlined on line 1, your student number on line 2, and S255 MTB1 on line 3. Anexample of a correct name block is:

Doe, Jane0012345S255 MTBl

The body of the submission should consist of the Histogram with title at the top, and the name forthe horizontal axis should include units.

Maximum score is 5 marks. One mark will be deducted for each separate error. Possible errorsinclude: failure to precisely follow the instructions for the Name Block, data entry error, incorrector missing title, etc.



The Table below gives the heights (in cm) of the group of 200 adult male patients seen at a particularclinic. It is desired to estimate µ, the true mean height for this population of 200 patients, usingthe recorded height measurements from a simple random sample of 5 patients from the group.

(a) Open Minitab. Load the population of numbers 1 through 200 into the first 200 rowsof column C1, name this column Population, and display this population in the Sessionwindow. These tasks are quickly done by the following sequence of menu choices:

Calc I Make Patterned Data I Simple Set of Numbers... I Store patterned datain: Population I From first value: 1 I To last value: 200 I OK I Data I DisplayData... I Columns, constants, and matrices to display: Population I OK

Note: Do NOT enter the 200 height measurements into the MinitabWorksheet.


(b) Ask Minitab to choose a simple random sample of size 5 from the Population of 200numbers, store the sample in a column named Sample, and display this sample in theSession window. Use the following sequence:

Calc I Random Data I Sample From Columns... I Sample 5 rows from col-umn(s): Population I Store samples in: Sample I OK I Data I Display Data...I Columns, constants, and matrices to display: Sample I OK

Now select the two data displays (the Population display and the Sample display) from theSession window, copy them to the clipboard, and then paste them into a word processingdocument. This completes the Minitab portion of Assignment 2.

(c) Use your hand calculator to compute x, the sample mean height based on the 5 recordedheight measurements for the 5 patients chosen by Minitab in part (b). Give your answerin expanded form, and use the word processor’s equation editor (in MS Word: Insert IObject... I Microsoft Equation...; in WordPerfect: Insert I Equation...) to create theequation in your document. For example, if the 5 patients chosen by Minitab were identifiedby patient numbers 117, 54, 194, 27, and 59, then the correct answer for part (c) would be:

x =164.7 + 169.6 + 173.1 + 172.6 + 174.6

5= 170.92cm

Don’t use the high over-bar x. In Equation Editor type x first and then find the menu platethat contains several over-symbols, including x.


Doe, Jane0012345S255 MTB2

The body of the submission should consist of three items: data display of the Population of numbers,data display of the Sample of 5 numbers chosen by Minitab, and your estimate of height.

Maximum score is 5 marks. One mark will be deducted for each separate error, including failure toprecisely follow the instructions for the Name Block.



Analysis of the venom of 8-day-old worker bees yielded the following random sample of 12 obser-vations on X, the histamine content in the venom of a randomly chosen 8-day-old worker bee (innanograms):

565 616 600 633 612 530 476 562 650 544 670 493

If population random variable X has a near-normal distribution, then t-procedures can be used tomake inferences about the true mean histamine content for all worker bees of this age.

(a) Run a Normality Test on these data. Follow the guidelines below.

Open Minitab I enter the data into rows 1 through 12 of Column C1, and in the emptybox just below the C1 label name this column Histamine I Stat I Basic Statistics INormality Test... I Variable: C1 I Title: Worker Bee Study I OK

With the graph in the active window, click: Edit I Copy Graph

Now you can open a word processor (such as MS Word or Wordperfect) and paste the graphinto a document.

(b) Specify H0 and H1 for the Anderson-Darling Normality Test. (Note that this test is aboutthe general shape of the population distribution, about parameter values.)

(c) Give your conclusion based on the Anderson-Darling P-value. (Note that your conclusionshould not include estimates of parameter values.)


Doe, Jane0012345S255 MTB3

The body of the submission should consist of three items: Normal plot including Anderson-DarlingNormality Test output, your specification of H0 and H1, and your conclusion based on Anderson-Darling P-value.

Maximum score is 5 marks. One mark will be deducted for each separate error, including failure toprecisely follow the instructions for the Name Block.



Part 1

It was known that a toxic material was dumped in a river leading into a large saltwater commercialfishing area. Civil engineers studied the way the water carried the toxic material by measuring theamount of the material (in parts per million) found in oysters harvested at three different locations,ranging from the estuary out into the bay where the majority of commercial fishing was carried out.The data are given in the table below. It is desired to test for differences in the true mean partsper million of toxic material found in oysters harvested at the three sites.

Site 1 (estuary) 22 25 24 16 23 20 20 21

Site 2 (near bay) 14 13 20 22 18 20 25 19 21

Site 3 (far bay) 19 15 19 21 16 14 17 18

(a) Run a One-way ANOVA with Tukey’s pairwise comparisons. Use the following sequence:

Open Minitab I enter the data into column C1 (put the Site 1 data in rows 1 through 8,put the Site 2 data in rows 9 through 17, and put the Site 3 data in rows 18 through 25)and name this column Response I put identifiers for the three samples in ColumnC2 (put the number 1 in each of the rows 1 through 8, put the number 2 in each of therows 9 through 17, and put the number 3 in each of the rows 18 through 25) and namethis column Factor I Stat I ANOVA I Oneway... I Response: Response I Factor:Factor I Comparisons... I √

Tukey’s, family error rate: 5 I OK I OK

Now select the One-way Analysis of Variance output from the Session window, copy it to theclipboard, and then paste it into a word processing document. (Follow the layout instructionsgiven at the end of Part 4.)

(b) Define the three population means µ1, µ2, µ3.

(c) Specify H0 and H1 for testing for differences in means, and interpret the P-value.

(d) Interpret Tukey’s 95% simultaneous confidence intervals. Use these results to identify whichpairs of means are different; and for each unequal pair, identify which mean is larger.

Part 2

Teaching diabetics to measure their own blood glucose has been of great benefit. A new techniquethat is less expensive than the current procedure is under investigation. The technique uses aglucose oxidase stick. The stick develops two colours simultaneously, and these colours are matchedby eye to a chart that gives the glucose level. If this procedure can be shown to be accurate, itcan be put into widespread use. The data shown in the table below are obtained on X, the bloodglucose level as measured by a diabetic patient using the new glucose oxidase stick, and Y , thepatient’s blood glucose level as measured in a laboratory test. The data are given in millimoles perlitre.


x 7.2 2.5 10.2 9.8 8.5 3.6 5.0 5.0 12.5 4.1 3.7 17.5 3.9 3.7 4.0

y 6.2 2.2 12.8 11.2 8.0 4.4 6.2 4.5 13.1 4.0 3.1 16.0 4.3 3.5 4.5

(a) Run Regression of Y on X, including a 95% confidence interval for the mean laboratory-reported glucose level of a patient who reports the level to be 8.0 mmol/L. Use the followingsequence:

Open Minitab I enter the fifteen x, y pairs into rows 1 through 15 of columns C1 andC2 such that the x-coordinate goes into C1 and the y-coordinate goes into C2 of the samerow, and name these two columns x and y, respectively I Stat I Regression I Re-gression... I Response: y I Predictors: x I Options... I Prediction intervals fornew observations: 8.0 I OK I OK

Now select the Regression output from the Session window, copy it to the clipboard, andthen paste it into your word processing document. (Follow the layout instructions given atthe end of Part 4.)

(b) Find a point estimate for the mean laboratory-reported glucose level of a patient who reportsthe level to be 8.0 mmol/L, and a 95% confidence interval for this mean.

(c) Find the proportion of y variation explained by the Simple Linear Regression model.

(d) How strong is the evidence that the mean laboratory-reported glucose level linearly dependson the patient-reported level? Include the appropriate P-value in your answer.

(e) Run a Fitted Line Plot. Use the following sequence:

Stat I Regression I Fitted Line Plot... I Response [Y]: y I Predictor [X]: x IType of Regression Model � Linear IOptions... I Title: Blood Glucose MeasurementStudy I OK I OK

With the graph in the active window, click: Edit I Copy Graph. Now paste the FittedLine Plot into your word processing document. (Follow the layout instructions given at theend of Part 4.)

Part 3

A study is run to investigate the association between flower colour and fragrance in wild azaleas.Two hundred randomly selected, blooming plants are observed in the wild. Each is classified asto colour, and the presence or absence of fragrance is noted. The data are shown in the followingtable.

Flower Colour

Fragrance White Pink Orange

Yes 55 32 24

No 40 24 25

200


(a) Run a Chi-Square Test to test whether or not there is an association between fragrance andflower colour. Use the following sequence:

Open Minitab I enter the data into the first two rows of columns C1, C2, and C3 in thesame layout as shown in the table above I Stat I Tables I Chi-Square Test (Table inWorksheet)... I Columns containing the table: C1 C2 C3 I OK

Now select the Chi-Square Test output from the Session window, copy it to the clipboard,and then paste it into your word processing document. (Follow the layout instructions givenat the end of Part 4.)


(c) Give your conclusion based on the P-value.

Part 4

A study of the different leaf marks found on white clover plants is conducted. Each of 829 plantsrandomly chosen from long-grass areas is classified into one of five different leaf mark categories.Also, each of 658 plants randomly chosen from short-grass areas is similarly classified. The dataare given in the following table:

Type of mark Sample

Size

L LL Y+YL O Others

Long-grass areas 532 11 22 7 257 829

Short-grass areas 390 23 14 11 220 658

(a) Run a Chi-Square Test to test whether or not the true proportions of different marks areidentical for the two types of areas. Use the following sequence:

Open Minitab I enter the data into the first two rows of columns C1, C2, C3, C4 and C5 inthe same layout as shown in the table above I Stat I Tables I Chi-Square Test (Tablein Worksheet)... I Columns containing the table: C1 C2 C3 C4 C5 I OK

Now select the Chi-Square Test output from the Session window, copy it to the clipboard,and then paste it into your word processing document.


(c) Give your conclusion based on the P-value.

Your submission should be four pages left-corner stapled (no title page) with Name Block in theupper left-hand corner of page 1. The Name Block should consist of your last name underlined,followed by a comma and your first name not underlined on line 1, your student number online 2, and S255 MTB4 on line 3. Page 1 should contain Part 1, pages 2 and 3 should containPart 2 with (a), (b), (c), (d) on page 2 and (e) on page 3, and page 4 should contain Parts 3and 4.

Maximum score is 15 marks. One mark will be deducted for each separate error.

Documents

Stat 255 Supplement 2011 Fall