Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Statistics Lecture Notes – Tests of Hypothesis. Bautista
17
TESTS OF HYPOTHESIS
INTRODUCTION
Consider an experiment of tossing a coin 100 times. From the discussions on probability,
we are able to compute the theoretical probability of getting a head as 1/2, which is also the
same probability we have for getting a tail. Thus, assuming we have a fair coin, we would
expect 50 heads and 50 tails from our experiment. However, this obviously does not always
happen.
Let’s say we did this experiment and ended up with 48 heads and 52 tails. We would say
that this outcome would still be acceptable since the values are close to 50. What happens
though if we conduct the experiment and it results in only 46 heads? 42 heads? 30 heads?
What values would lead us to believe that the coin is not fair?
Our concern then in hypothesis testing is looking for the critical value wherein we can
say that a value would reject our original notion or assumption. These critical values are
determined using different distributions, depending on the parameter being tested. Some of the
distributions we will be using are the Z-distribution, T-distribution, χ2-distribution, and the F-
distribution.
So if we decide to test the fairness of our coin, we may test if the proportion of heads in
the experiment, denoted by p, is actually equal to 1/2. Alternatively, we can also test if the mean
number of heads that appear in the experiment, denoted by μ, is actually equal to 50. These
statements which we test are what we call as the null hypotheses. This is the “default”
statement and it is the one we are testing.
In most cases, we would want to reject the null hypothesis in favor of what we call as the
alternative hypothesis. This is the hypothesis stating that there is a change in the original
parameter we are testing. Let’s say we would like to test the mean number of heads that appear
in our sample experiment. Our null hypothesis, denoted by HO, would be
HO: μ = 50
We note that the null hypothesis always contains the equal sign, as this is the “default”
statement. Given this null hypothesis, our possible alternative hypotheses, denoted by H1 or HA,
would be
H1: μ > 50; H1: μ < 50, H1: μ ≠ 50
The first two alternative hypotheses are the ones we use in one-tailed tests, while the
last one is the one we use in two-tailed tests. Thus if our alternative hypothesis is H1: μ ≠ 50, it
means we are testing if μ > 50 or μ < 50.
Given our data, if we have sufficient evidence that the null hypothesis is not true, then
we reject the null hypothesis in favor of the alternative hypothesis. We may also say that we
accept the alternative hypothesis. Let’s say we establish a critical region of 40 and 60 from our
Statistics Lecture Notes – Tests of Hypothesis. Bautista
18
previous example. This means that if we conduct the experiment and it results in less than 40
heads, or more than 60 heads, then our null hypothesis that μ = 50 is not true and hence we
reject it.
On the other hand, if the experiment results in say, 43 heads, then this value is still
inside our acceptance interval and hence we say that we still accept the null hypothesis. Note
that acceptance of the null hypothesis doesn’t mean that it is true, it only means that there was
insufficient evidence to reject it.
Now if the experiment results in 35 heads, we say that we reject the null hypothesis. The
reason for this is either a rare event has occurred, or the null hypothesis is actually not true.
Hence, there is still a room for mistake when we do these tests of hypothesis. When we reject
the null hypothesis when in fact it is true, then we have committed a type I error. The probability
of this error is denoted by α, and is also called the level of significance.
The other type of error is when we fail to reject the null hypothesis when in fact it is false.
This type of error is called a type II error. The probability of this error is denoted by β. The
power of the test is computed by 1 – β. These notions are summarized in the following table.
Null Hypothesis is True Null Hypothesis is False
Accept Correct Decision Type II Error (β)
Reject Type I Error (α) Correct Decision
We would want to minimize these errors as much as possible, however, as we decrease
the probability of a type I error, we increase the probability of a type II error, and vice versa.
Hence we try to find a balance between these two types of errors. In general, if we increase the
sample size, then the probabilities of both errors decrease.
STEPS IN HYPOTHESIS TESTING
The following steps will be followed for each of our tests of hypothesis. For organization
and neatness of our solutions, these steps should be outlined when solving our examples and
exercises.
1. Write the null and alternative hypothesis.
2. Indicate the level of significance.
3. Establish the critical regions and the rejection criterion.
4. Compute the test statistic.
5. Decide the conclusion of the test.
The information required in each of these steps should be given in the problem. The test
statistics for each test of hypothesis will be outlined in the next sections.
TESTS CONCERNING MEANS
For tests on single means, we would be concerned in testing if the mean μ would be
really equal to a predetermined value μO. Thus the null hypothesis μ = μO would be tested
Statistics Lecture Notes – Tests of Hypothesis. Bautista
19
against the alternative hypothesis μ < μO or μ > μO for a one-tailed test, and against μ ≠ μO for a
two-tailed test.
For tests on means for two populations, we would be testing the null hypothesis
against the alternatives and for the one-tailed test, and
for the two-tailed test. For paired observations, the null hypothesis would be
against the alternatives and for the one-tailed test and for the
two-tailed test.
In these tests, we will be using the same cases, statistics, and distributions outlined in
the previous chapter. Hence, if we are testing for a single mean where the population standard
deviation, σ, is known, we may use the Z distribution, where our critical values are and
for a two-tailed test, where is the z-value which gives an area of α/2 to the left and
is the z-value which gives an area of α/2 to the right.
Thus if our computed test statistic falls to the left of or to the right of , then this
would result in a rejection of the null hypothesis. For the one-tailed tests, the following critical
values apply:
For the one-tailed test μ < μO, the null hypothesis would be rejected if the test statistic is
less than , which is the z-value which gives an area of α to the left. Note that we use α instead
1 - α
α/2 α/2
z1-α/2 zα/2
1 - α
α
zα
Statistics Lecture Notes – Tests of Hypothesis. Bautista
20
of α/2 since we are only testing on one side of the standard normal distribution. Similarly, for the
one-tailed test μ > μO, the null hypothesis would be rejected if the computed test statistic is
greater than , which is the z-value which gives an area of α to the right.
Example 1: A manufacturer of sports equipment has developed a new synthetic fishing line that
he claims has a mean breaking strength of 8 kilograms with a standard deviation of 0.5
kilograms. Test the hypothesis that μ = 8 kg against the alternative that μ ≠ 8 kg if a random
sample of 50 lines is tested and found to have a mean breaking strength of 7.8 kg. Use a 0.01
level of significance.
Solution:
We follow the five steps listed above:
1. HO: μ = 8 kg (null hypothesis)
H1: μ ≠ 8 kg (alternative hypothesis)
2. α = 0.01 (level of significance)
3. Based on the standard normal table, the critical values for α = 0.01 would be z = -2.575
and z = 2.575. Thus we reject HO if our test statistic z is less than -2.575 or greater than
2.575 (z < -2.575 or z > 2.575).
4. From the previous section, we use a similar test statistic when the population standard
deviation σ is given (or if n>30),
Thus, using the values given in the problem, we compute our test statistic z to be
Since -2.83 is less than -2.575, we have sufficient evidence to reject the null hypothesis,
at the 0.01 level of significance.
1 - α
z1-α
α
Statistics Lecture Notes – Tests of Hypothesis. Bautista
21
5. Conclusion: We reject the manufacturer’s claim that the new fishing line’s breaking
strength is 8 kg. At a 0.01 level of significance, we have sufficient evidence to say that
the true breaking strength is less than 8 kg.
The rest of the test statistics for the other cases for single populations and two
populations are listed in the table on below. Notice the similarity with the statistics used in the
previous chapter.
Null Hypothesis HO
Test Statistic
Single Population
μ = μO
Case 1: If σ is known, or n≥30
μ = μO
Case 2: If σ is unknown, and n<30
Two Populations
Case 1: If and are known, or unknown but and
Case 2: If and are unknown and and
, but the variances are assumed equal
Case 3: If and are unknown and and
, but the variances are assumed unequal
Statistics Lecture Notes – Tests of Hypothesis. Bautista
22
Paired Observations
We also note that in general, if the computed test statistic is z, we compare it with for
a lower tail test or with for an upper tail test, and with or for a two-tailed test. If
the computed test statistic is t, then we look for the critical values , , and from
the T distribution table, with the corresponding degrees of freedom, v, listed in the previous
table.
Alternative Hypothesis
μ < μO μ > μO μ ≠ μO
Reject HO if
or
We also note that when testing the equality of the means of two populations, we test the
if the difference is equal to a specified value . If we are specifically testing if the
means of the two populations are equal, then we set , and proceed with testing the
hypothesis that .
Example 2: A random sample of 100 recorded deaths in the United States during the past year
showed an average life span of 71.8 years, with a standard deviation of 8.9 years. Does this
seem to indicate that the average life span today is greater than 70 years? Use a 0.05 level of
significance.
Example 3: The average length time for students to register for fall classes at a certain college
has been known to be 50 minutes with a standard deviation of 10 minutes. A new registration
procedure using modern computing machines is being implemented. If a random sample of 12
students had an average registration time of 42 minutes with a standard deviation of 11.9
minutes under the new system, test the hypothesis that the population mean is now less than
50, using a level of significance of 0.05. Assume the population of times to be normal.
Example 4: A course in mathematics is taught to 12 students by the conventional classroom
procedure. A second group of 10 students was given the same course by means of
programmed materials. At the end of the semester the same examination was given to each
group. The 12 students meeting in the classroom made an average grade of 85 with a standard
deviation of 4, while the 10 students using programmed materials made an average of 81 with a
standard deviation of 5. Test the hypothesis that the two methods of learning are equal using a
0.10 level of significance. Assume the population to be approximately normal with equal
variances.
Example 5: To determine whether membership in a fraternity is beneficial or detrimental to one’s
grades, the following grade-point averages were collected over a period of 5 years:
Statistics Lecture Notes – Tests of Hypothesis. Bautista
23
Year
1 2 3 4 5
Fraternity 2.0 2.0 2.3 2.1 2.4
Non-fraternity 2.2 1.9 2.5 2.3 2.4
Assuming the populations to be normal, test at the 0.05 level of significance whether
membership in a fraternity is detrimental to one’s grades.
TESTS CONCERNING PROPORTION
Next, we may be interested in testing the proportion of successes in a population. For
instance, a politician may be interested in the proportion of citizens who will vote for him, or a
manufacturing firm may be interested in the proportion of defectives that arise from a sample of
products. For these cases we will be testing the null hypothesis
where p is the population proportion and is the specified value of the proportion being tested.
The possible alternative hypotheses are
, , and .
Again we are conducting a binomial experiment on a sample, counting the number of
successes, and determining through this value if our null hypothesis is true or not.
To find our critical values, we would be using the binomial probabilities listed in the
binomial probability table. However, in most practical applications, the sample size is greater
than 30, and when n > 30, the normal probability distribution can be used to approximate the
binomial distribution. This is an easier way of getting critical values, and would prove to be
accurate as long as we have a large sample size and is not close to 0 or 1.
We then compute the test statistic
and compare it with the critical values from the standard normal distribution. The rejection
criteria are summarized below:
Alternative Hypothesis
Reject HO if
or
We may also be interested in testing the equality of two proportions. If so we would be
testing the null hypothesis
Statistics Lecture Notes – Tests of Hypothesis. Bautista
24
against the alternatives
where and are the proportions of success in the two populations.
The test statistic to be used is
where and are the proportions of success from the two samples, with the respective
sample sizes and , and is computed as
where and are the number of successes from each sample. Lastly, . The
rejection criteria are summarized below.
Alternative Hypothesis
Reject HO if
or
Example 1: A commonly prescribed drug on the market for relieving nervous tension is believed
to be only 60% effective. Experimental results with a new drug administered to a random
sample of 100 adults who were suffering from nervous tension showed that 70 received relief. Is
this sufficient evidence to conclude that the new drug is superior to the one commonly
prescribed? Use a 0.05 level of significance.
Example 2: A vote is to be taken among the residents of a town and the surrounding county to
determine whether a civic center will be constructed. To determine if there is a significant
difference in the proportion of town voters and county voters favoring the proposal, a poll is
taken. If 120 of 200 town voters favor the proposal and 240 of 500 county residents favor it,
would you agree that the proportion of town voters favoring the proposal is higher than the
proportion of county voters? Use a 0.025 level of significance.
TESTS CONCERNING VARIANCES
We may also be interested in testing the uniformity of a certain population, or in
comparing the uniformity of two populations. For a single population, we would be testing if the
population variance σ is equal to a specified value . Hence the null hypothesis would be
against the alternatives
Statistics Lecture Notes – Tests of Hypothesis. Bautista
25
.
Assuming the population distribution to be approximately normal, we can use the χ2 test
statistic given by
where is the sample variance, and n is the sample size. We then compare this statistic with
the critical values taken from the χ2-table. The rejection criteria are as follows:
Alternative Hypothesis
Reject HO if
or
Example 1: A manufacturer of car batteries claims that the life of his batteries has a variance
equal to 0.81 years. If a random sample of 10 of these batteries have a variance of 1.44 years,
do you think that a year? Use a 0.05 level of significance.
When testing the equality of the variances of two populations, the null hypothesis is
which will be tested against any of the alternatives
.
For independent random samples of size and for the two populations, the test
statistic is given by
where is the variance of the first sample and is the variance of the second sample. This
statistic will then be compared to critical values taken from the F-distribution. These rejection
criteria are summarized below:
Alternative Hypothesis
Reject HO if
, or
Note that and .
Example 2: In testing the equality of the population means in Example 4 under Tests
Concerning Means, we assumed that the two population variances are equal but unknown. Are
we justified in making this assumption? Use a 0.10 level of significance.
Statistics Lecture Notes – Tests of Hypothesis. Bautista
26
P-VALUE APPROACH
The p-value approach uses a single value called the p-value to determine whether or not
to reject the null hypothesis. This is the output generated by most statistical softwares. The p-
value is the probability of getting the sample data given that the null hypothesis is true. Hence, if
we have a low p-value (close to 0), we may reject the null hypothesis, and fail to reject it if the p-
value is quite large.
Determining how small the p-value must be in order to reject the null hypothesis is not
easy, and may involve some subjectivity. As a general measure though, we usually compare the
p-value with the level of significance. Hence, we reject the null hypothesis if the p-value is less
than the level of significance, α.
For example, if our level of significance is 0.05, and we have a p-value of 0.0341, then
we would reject the null hypothesis. However, what happens if we have a p-value of say,
0.0612? We may say that we fail to reject the null hypothesis because the p-value is not less
than the level of significance. However, we may also want to reject the null hypothesis because
we still have a relatively low p-value (e.g. We would only get the sample data if the null
hypothesis is true, at 6.12% of the time). This is where subjectivity comes in, and different
conclusions may be made depending on the study being conducted.
EXERCISES
1. The average height of females in the freshman class of a certain college has been 162.5
centimeters with a standard deviation of 6.9 centimeters. Is there reason to believe that
there has been a change in the average height if a random sample of 50 females in the
present freshman class has an average height of 165.2 centimeters, using a 0.05 level
of significance?
2. Test the hypothesis that the average content of containers of a particular lubricant is 10
liters if the contents of a random sample of 10 containers are 10.2, 9.7, 10.1, 10.3, 10.1,
9.8, 9.9, 10.4, 10.3, and 9.8 liters. Use a 0.01 level of significance and assume that the
distribution of contents is normal.
3. A manufacturer claims that the average tensile strength of thread A exceeds the average
tensile strength of thread B by at least 12 kilograms. To test this claim, 50 pieces of each
type of thread are tested under similar conditions. Type A thread had an average tensile
strength of 86.7 kilograms with a standard deviation of 6.28 kilograms, while type B
thread had an average tensile strength of 77.8 kilograms with a standard deviation of
5.61 kilograms. Test the manufacturer’s claim using a 0.05 level of significance.
4. A study is made to see if increasing the substrate concentration has an appreciable
effect on the velocity of a chemical reaction. With the substrate concentration of 1.5
moles per liter, the reaction was run 15 times with an average velocity of 7.5 micromoles
per 30 minutes and a standard deviation of 1.5. With a substrate concentration of 1.0
moles per liter, 12 runs were made yielding an average velocity of 8.8 micromoles per 30
minutes and a sample standard deviation of 1.2. Would you say that the increase in
substrate concentration increases the mean velocity by more than 0.5 micromoles per
Statistics Lecture Notes – Tests of Hypothesis. Bautista
27
30 minutes? Use a 0.01 level of significance and assume the populations to be
approximately normally distributed with equal variances.
5. A taxi company is trying to decide whether the use of radial tires instead of belted tires
improves fuel economy. Twelve cars were equipped with radial tires and driven over a
prescribed test course. Without changing drivers, the same cars were then equipped
with regular belted tires and driven once again over the same test course. The gasoline
consumption, in kilometers per liter, was recorded as follows:
Car Kilometers Per Liter
Radial Tires Belted Tires
1 4.2 4.1
2 4.7 4.9
3 6.6 6.2
4 7.0 6.9
5 6.7 6.8
6 4.5 4.4
7 5.7 5.7
8 6.0 5.8
9 7.4 6.9
10 4.9 4.7
11 6.1 6.0
12 5.2 4.9
At the 0.025 level of significance, can we conclude that cars equipped with radial tires
give better fuel economy than those equipped with belted tires? Assume the populations
to be normally distributed.
6. A soft-drink dispensing machine is said to be out of control if the variance of the contents
exceeds 1.15 deciliters. If a random sample of 25 drinks from this machine has a
variance of 2.03 deciliters, does this indicate at the 0.05 level of significance that the
machine is out of control? Assume that the contents are approximately normally
distributed.
7. A study is conducted to compute the length of time between men and women to
assemble a certain product. Past experience indicates that the distribution of times for
both men and women are approximately normal but the variance of the times for women
is less than that for men. A random sample of time for 11 men and 14 women produced
the following data: the variance for men was 6.1 while the variance for women was 5.3.
Test the hypothesis that against the alternative using a 0.01 level of
significance.
8. The gas company claims that two thirds of the houses in a certain city are heated by
natural gas. Do we have reason to doubt this claim if, in a random sample of 1000
houses in this city, it is found that 618 are heated by natural gas? Use a 0.02 level of
significance.
9. A geneticist is interested in the proportion of males and females in a population that have a certain minor blood disorder. In a random sample of 100 males, 31 are found to be afflicted, whereas only 24 of 100 females tested appear to have the disorder. Can we
Statistics Lecture Notes – Tests of Hypothesis. Bautista
28
conclude at the 0.01 level of significance that the proportion of men in the population afflicted with this blood disorder is significantly greater than the proportion of women afflicted?
BUSINESS APPLICATIONS
10. A firm is studying the delivery times of two raw material suppliers. The firm is basically satisfied with supplier A and is prepared to stay with that supplier. However, if the firm finds that the mean delivery time of supplier B is less than that of supplier A, it will begin making raw material purchases from supplier B. Assume that 50 independent samples from each supplier show the delivery time of supplier A to be 14 days with a standard deviation of 3 days, while the delivery time of B is shown to be 12.5 days with a standard deviation of 2 days. Testing at a 0.05 level of significance, should the firm switch to supplier B or not?
11. Starting annual salaries for individuals with master’s and bachelor’s degrees were collected in two independent random samples. The average starting salary of a random sample of 60 individuals with master’s degrees showed a mean of $45,000 with a standard deviation of $4000, while those with bachelor’s degrees had a mean starting salary of $35,000 with a standard deviation of $3500. Test at a 0.05 level of significance if those with master’s degrees had a significantly higher average starting salary than those individuals with bachelor’s degrees.
12. Starting annual salaries for individuals entering the public accounting and financial planning professions were presented in Fortune, June 26, 1995. The starting salaries for a sample of 12 public accountants and a sample of 14 financial planners follow, with data in thousands of dollars. Public Accountant:
30.6 31.2 28.9 35.2 25.1 33.2 31.3 35.3 31.0 30.1 29.9 24.4
Financial Planner:
31.6 26.6 25.5 25.0 25.9 32.9 26.9 25.8 27.5 29.6 23.9 26.9
24.4 25.5
Test if the starting annual salaries of the accountants and financial planners are equal at
a 0.05 level of significance.
13. Rental car gasoline prices per gallon were sampled at eight major airports. Data for Hertz and National car rental companies follow (USA Today, April 4, 2000).
Airport Hertz National
Boston Logan 1.55 1.56
Chicago O’hare 1.62 1.59
Los Angeles 1.72 1.78
Miami 1.65 1.49
New York (JFK) 1.72 1.51
New York (LaGuardia) 1.67 1.50
Orange County 1.68 1.77
Washington 1.52 1.41
Statistics Lecture Notes – Tests of Hypothesis. Bautista
29
Test at a 0.05 level of significance if there is a difference between the gas prices of the
two rental car companies.
14. Figure Perfect Inc., is a women’s figure salon that specializes in weight reduction programs. Weights for a sample of clients before and after a 6-week introductory program are shown here.
Client Weight Before Weight After
1 140 132
2 160 158
3 210 195
4 148 152
5 190 180
6 170 164
Determine at a 0.05 level of significance whether the introductory program provided a
statistically significant weight loss.
15. Yahoo! Internet Life sponsored surveys in several metropolitan areas to estimate the proportion of adults using the Internet at work (USA Today, May 7, 2000). Results showed 96 of 240 of Washington D.C. adults use the Internet at work, while 80 of 250 of San Francisco adults use the internet at work. Do the sample results indicate that the population proportion of adults using the Internet at work in Washington D.C. is greater then the population proportion in San Francisco? Use a 0.05 level of significance.
16. A Business Week/Harris survey asked senior executives at large corporations their opinions about the economic outlook for the future. One question was “Do you think that there will be an increase in the number of full-time employees at your company over the next 12 months?” In May 1997, 220 of 400 executives answered yes, while in December 1996, 192 of 400 executives had answered yes. Test at a 0.04 level of significance if there was a significant increase in the proportion of executives who answered yes from December 1996 to May 1997.
17. The standard deviation in the 12-month earnings per share for 10 companies in the airline industry was 4.27, and the standard deviation in the 12-month earnings per share for 7 companies in the automotive industry was 2.27. Conduct a test for equal variances at a 0.10 level of significance.
18. On the basis of data provided by a Romac salary survey, the variance in annual salaries for seniors in public accounting firms is approximately 2.1 and the variance in annual salaries for managers in public accounting firms is 11.1 (data in thousands of dollars). Assuming that the salary data were based on samples of 25 seniors and 25 managers, test the hypothesis that the population variances of the salaries are equal. Use a 0.02 level of significance.