32
Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Embed Size (px)

Citation preview

Page 1: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Engineering Statistics Chapter 3

Distribution of Samples

Distribution of sample statistics

3C - Proportions and Difference between Proportions

Page 2: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Proportion of a property• When a sample is collected in relation to a

property, it is important to know if its proportion is reasonable. For example, when we interview a group of people for work, we would like to know if the proportion of candidates is normal based on gender, age, race etc.

• The proportion of a property is highly dependent on the size of samples. In small samples, it is not surprising if the proportion of a sample is unusual. When the size increases, we expect the proportion to be closer to that of the population.

Page 3: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Distribution of proportions

• If the proportion of a property in a population is , and we take samples of size n, then the proportion p is expected to follow the normal distribution, with a mean , and a variance (1 - )/n.

• As can be seen, the variance decreases as the sample size increases. When n is large, we would expect the proportion of the property in the sample to be very close to that of the population.

Page 4: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Need for Continuity Adjustment

• Since the proportion is based on a ratio m/n, the value of m will be an integer. In order to avoid bias in obtaining the correct proportion, it is necessary to introduce a correction of ½ unit. This is the same as for continuity correction in discrete-to-continuous approximation.

• Thus we shall treat p > m/n as p > (m+½)/n, p m/n as p (m – ½)/n, p < m/n as p < (m – ½)/n, and p m/n as p (m+½)/n.

Page 5: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Example 1• 30% of customers to a fast-food restaurant are old

folks who are given discounts. During a short period, the restaurant serves 40 customers. What is the probability the percentage of old folks is not more than 25%?

Solution: p ~ N(0.3, 0.3×(1–0.3)/40).P(p0.25)

P(p 0.25 + 0.5/40) [Continuity adjustment]= P(z [0.2625–0.3]/0.00525) = P(z –0.52) = 0.5–0.1985 = 0.3015.

Page 6: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Example 2

• A furniture factory claims that less than 12% of its executive chairs has defects. An office just ordered 25 such chairs. What is the probability the percentage of defects exceeds 15%?

Solution: p ~ N(0.12, 0.12×(1–0.12)/25).P(p> 0.15)

P(p > 0.15 + 0.5/25) [Continuity adjustment]= P(z > [0.17–0.12]/0.004224) = P(z > 0.77) = 0.5–0.2794 = 0.2206.

Page 7: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Example 3

• It is estimated that 65% of students in the Faculty of Education are ladies. A class in FoE has 120 students. What is the probability the proportion of ladies in the class exceeds 70%?

Solution: Let p represents the proportion for ladies, then p ~ N(0.65, [0.65×(1-0.65)]/120). After continuity correction, P(p > 0.70)

P(p > 0.70 + 0.5/120) = P(z > [0.7042 – 0.65]/(0.65×0.35/120)= P(z > 1.24) = 0.5–0.3925 = 0.1075.

Page 8: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Alternative: Binomial distribution.

We note that the same question can be solved using binomial distribution as follows:

Let X represent number of ladies. X~Bin(120, 0.65). As n>30, X is approximated by normal distribution X~N(120×0.65, 120×0.65×0.35).

70% of 120 is 84. We are looking for P(X>84).

By continuity adjustment, we have

P(X>84.5) = P(z>[84.5-78]/27.3)

= P(z > 1.24) = 0.1075, as we obtained earlier.

Page 9: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Example 4

• 18% of students withdraw half-way through a course. In a class with 45 students, what is the probability less than 15% will withdraw?

Solution: p ~ N(0.18, 0.18×(1–0.18)/45)

After continuity adjustment, the event

p < 0.15 p < 0.15–0.5/45

P(a < 0.1389) = P(z < [0.1389–0.18]/0.00328)

= P(z < –0.72) = 0.5 – 0.2642 = 0.2358.

Page 10: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Binomial Alternative:

Let W represent the number of students who withdraw. Then W~Bin(45, 0.18).

15% of 45 is 6.75. So the event is W<6.75. Even though the number here is a decimal, we still need to make the same continuity adjustment. Thus we look for W < 6.75–0.5.

As n>30, we use the approximation W~N(45×0.18, 45×0.18×0.82).

P(W < 6.25) = P([6.25 – 8.1]/6.642)

= P(z < –0.72) = 0.2358, as found above.

Page 11: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Difference between proportions

• The same rules on the distribution of the difference between means will apply to the difference between proportions. Thus if 1 and 2 are proportions of the same property for two populations, and we take samples of sizes n1 and n2 from those two population respectively, then we expect the difference of proportions p1–p2 of the samples to satisfy

p1–p2~N(1–2 , 1(1–1)/n1+2(1–2 )/n2).

Page 12: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Example 5

• In the 1985 cohort, it is known that 20% of non-graduates and 14% of graduates remain unemployed 6 months after coming on to the market. A survey tracks 80 non-graduates and 50 graduates of the cohort. Find the probability the percentage of non-graduates who remain unemployed exceed that of graduates by at least 10%.

Page 13: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Solution:

Let pn represent the proportion that of non-graduates and pg that of unemployed graduates.

pn – pg ~ (0.20–0.14, 0.2×0.8/80+0.14×0.86/50)

P(pn – pg > 0.1) = P(z > [0.1 – 0.06] /(0.2×0.8/80+0.14×0.86/50))

= P(z > 0.60) = 0.5 – 0.2257 = 0.2743.

Page 14: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Example 6• The Transport Ministry believes that 35% of express

buses exceed speed limits on the highway. On a certain day, two teams track express buses going in opposite directions. The team for north-bound traffic monitor 60 buses, while the south-bound team has 75 buses on record. What is the probability the percentage of speeding buses for north-bound exceeds that of southbound by at least 4%?

Solution:Let pn represent the proportion of north-bound buses which speed, and ps the same proportion for south-bound buses.

Page 15: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

pn–ps ~ (0.35-0.35, 0.35×0.65/60+0.35×0.65/75)

P(pn-ps > 0.04) = P(z > [0.04 – 0] (0.35×0.65/60+0.35×0.65/75))

= P(z > 0.48) = 0.5 – 0.1844 = 0.3156.

So there is a probability of 0.3156 that the north-bound speeding percentage might exceed that of south-bound by 4% or more.

Note that in this case, we also have the same probability 0.3156 that the proportion of south-bound speeders exceeds that of north-bound by 4%!

Page 16: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Confidence Interval for Proportion

• When we have the proportion of a property from the population, we expect the proportion for a sample to follow the normal distribution.

• Hence, we may apply the same procedure to estimate the (1–)100% confidence interval as for the mean. We shall use two examples to illustrate the method.

Page 17: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Example 7• The Tourism Department reports says that 32% of tourists

are foreigners. A group of 150 tourists are visiting the Royal Museum. What is 98% confidence interval for the percent of foreign tourists?

Solution :p~N(0.32, 0.32×0.68/150); p~N(0.32, 0.001451)

At 95% confidence, =0.05, /2=0.025. Z0.025 = 1.96.Hence the 95% confidence interval for the proportion of

foreign tourist is0.32–1.96×0.001451 p 0.32+1.96×0.001451 0.2453 p 0.3947 24.53% to 39.47% of the tourists are foreigners.

Page 18: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Example 8

• The records of a bank shows that 17% of its customers are business customers, but the transactions for this group make up 75%. During a certain hour, there were 50 customers and 400 transactions. Find the 90% confidence interval for the percentage of

(i) Business customers;

(ii) Business transactions.

Page 19: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Solution:p1 = proportion of business customers;p2 = proportion of business transactions.p1~N(0.17, 0.17×0.83/50);p2~N(0.75, 0.75×0.25/400).

At 90% confidence, =0.1, /2=0.05. z0.05= 1.6449.The confidence intervals are:

0.17 – 1.6449×0.002822 p1 0.17 + 1.6449×0.002282 0.0826 p1 0.2574; and0.75 – 1.6449×0.00046875 p2 0.75 + 1.6449×0. 00046875 0.7144 p2 0.7856.

Hence the range is 8.26% to 25.74% for business customers, and 71.44% to 78.56% for business transactions.

Page 20: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Confidence Interval From Sample

• When the proportions are derives from data of samples, we expect the same normal distribution can be used to model the population proportion, using the sample proportion as the estimator.

• For such purposes, we expect the result will be good only if the sample size is reasonably large. For small samples, it is not reliable to use the proportion obtained to obtain a general picture of the population proportion.

Page 21: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Example 9

• In a survey on cleanliness of eating stalls, it was found that only 55 out of 140 stalls checked follow proper procedures to maintain hygienic environments. Based on this, estimate the 95% confidence interval for the percentage of clean eating stalls nationwide.

Page 22: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Solution:

Even though only the sample data are available, we can safely assume that the proportion from such a big sample is a good estimator for the wider proportion. Hence we shall use the normal distribution to estimate the proportion for the nation:

p~N(55/140, [55/140×85/140]/140)

At 95%, =0.05, /2=0.025. Z0.025 = 1.96.

So the 95% interval for population proportion of clean eateries is 55/140 – 1.96([55/140×85/140]/140) to 55/140 + 1.96([55/140×85/140]/140)

0.3120 p 0.4738 or 31.2% to 47.38%.

Page 23: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Example 10

• During a screening process, it was found that 20 out of 80 boys 15-18 years old and 30 out of 100 girls of the same age group are fat. Based on this study, find the probability the proportion of fat girls exceeds that of boys by 2% or more.

NOTE: In this case, we only have the sample proportions. However, as the sample sizes are large enough, we can use these data to project the likely distribution of the difference of proportions.

Page 24: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Solution: Note: 20/80 = 0.25, 30/100 = 0.3.

pb~N(0.25, 0.25×0.75/80);

pg~N(0.3, 0.3×0.7/100);

pg – pb ~ N(0.3-0.25, 0.25×0.75/80 + 0.3×0.7/100)

P(pg – pb > 0.02)

= P(z >[0.02-0.05]/(0.25×0.75/80 + 0.3×0.7/100)

= P(z > -0.45)

= 0.5 + 0.1736

= 0.6736.

Page 25: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Difference of proportions

• Using the distribution for difference between proportions, we can find the probability for the difference between proportions (Exs 11 & 12).

• When the sample sizes are large, we can also use the sample proportions to estimate the interval for the difference between population proportions. The same procedure is used to determine the confidence interval for the difference in proportions (Ex 13).

Page 26: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Example 11

• On the average, 37% of men and 18% of women in the country smoke. A survey is taken for 50 men and 60 women. What is the probability the proportion of men who smoke exceeds that of women by at least 20%?

Solution: Let pm and pw represent the proportion of men and women who smoke. Then

pm ~ N(0.37, 0.37×0.63/50);

pw ~ N(0.18, 0.18×0.82/60).

Page 27: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Example 11 (Solution)

This means that

pm – pw ~N(0.37 – 0.18, 0.37×0.63/50+ 0.18×0.82/60).

So P(pmpw +0.20) = P(pm – pw 0.20)= P(z [0.20 – 0.19]/(0.37×0.63/50+ 0.18×0.82/60).= P(z 0.12)= 0.5 – 0.0478 = 0.4522

Page 28: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Example 12• 65% of those achieving good results at STPM

exam and 55% of those for Matriculation exam get admitted to universities of their choice. A check is made on 72 students successful at STPM and 45 of those at Matriculation. What is the probability the success rate in university admission for those through Matriculation is at least as good as those through STPM?

• Solution: Let ps be the proportion of STPM candidates who are successful and pm for that of matriculation candidates.

Page 29: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Example 12 (Solution)Then we have:

ps ~ N(0.65, 0.65×0.35/72);

pm ~ N(0.55, 0.55×0.45/45). And so

pm – ps ~N(0.55 – 0.65, 0.65×0.35/72 + 0.55×0.45/45).

Hence P(pmps) = P(pm – ps 0.0)= P(z [0.0 – (-0.10]/(0.65×0.35/72 + 0.55×0.45/45).= P(z 1.07)= 0.5 – 0.3577 = 0.1423.

Page 30: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Example 13

• Out of 75 sticks of LajuMaut cigarettes, 20 are found to have nicotine exceeding danger levels. For 60 sticks of LajuMaut cigarettes, 15 are also found to have nicotine exceeding danger levels. What is the 90% confidence interval of pL –pC, where pL and pC represents the proportions of LajuMuat and CepatMaut cigarettes with excessive levels of nicotine?

Page 31: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

• From the data given, pL=20/75 = 0.2667, and pC=15/60 = 0.25. By theory, pL –pC ~N(0.2667–0.25, 0.2667×0.7333÷75 + 0.25×0.75÷60).

• At 90% confidence, =0.1, /2=0.05. And z0.05=1.6449. Hence the confidence interval for the difference in proportion is from 0.0167 –1.6449×(0.2667×0.7333÷75 + 0.25×0.75÷60) to 0.0167+1.6449×(0.2667×0.7333÷75 + 0.25×0.75÷60), I.e. –0.1078 to 0.1412.

• NOTE: The left boundary –0.1078 indicates that pL may actually be less than pC.

Page 32: Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Multiple Groups• When we want to compare the proportions of multiple

(3 or more) groups in a population, the method using normal distribution becomes ineffective.

• An alternative is to use the differences between what are expected and what are obtained and treat them as variations.

• The sum of squares of the differences can be modeled using the 2

distribution. However, as 2 distribution

tables do not provide for probabilities, we shall only look at these cases in hypothesis testing. (See 4C).