Upload
devan-brindle
View
250
Download
22
Tags:
Embed Size (px)
Citation preview
Estimation of Means and Proportions
Concepts
• Estimator: a rule that tells us how to estimate a value for a population parameter using sample data
• Estimate: a specific value of an estimator for particular sample data
Concepts
• A point estimator is a rule that tells us how to calculate a particular number from sample data to estimate a population parameter
• An interval estimator is a rule that tells us how to calculate two numbers based on sample data, forming a confidence interval within which the parameter is expected to lie
Properties of a Good Estimator
• Unbiasedness: mean of the sampling distribution of the estimator equals the true value of the parameter
• Efficiency: The most efficient estimator among a group of unbiased estimators is the one with the smallest variance
Properties of a Good Estimator
Estimation of a Population Mean
• The CLT suggests that the sample mean may be a good estimator for the population mean. The CLT says that:
– Sampling distribution of sample mean will be approximately normally distributed regardless of the distribution of the sampled population if n is large
– The sample mean is an unbiased estimator
– The standard error of the sample mean is
x
nx
• A point estimator of the population mean is:
• An interval estimator of the population mean is a confidence interval, meaning that the
true population parameter lies within the interval
of the time, where is the z value corresponding to an area in the upper tail of a standard normal distribution
Estimation of a Population Mean
x̂
nzx
2/
2/z
2/
%100*)1(
%100*)1(
Estimation of a Population Mean
• Usually σ (the population standard deviation) is unknown. – If n is large enough (n ≥ 30) then we can
approximate it with the sample standard deviation s.
One Sided Confidence Intervals
• In some cases we may be interested in the probability the population parameter falls above or below a certain value
• Lower One Sided Confidence Interval (LCL): – LCL= (point estimate) –
• Upper One Sided Confidence Interval (UCL):– UCL = (point estimate) +
nz
*
nz
*
Small Sample Estimation of a Population Mean
• If n is large, we can use sample standard deviation s as reliable estimator of population standard deviation – No matter what distribution the population has, sampling
distribution of sample mean is normally distributed
• As the sample size n decreases, the sample standard deviation s becomes a less reliable estimator of the population standard deviation (because we are using less information from the underlying distribution to compute s)
• How do we deal with this issue?
t Distribution
• Assume
(1) The underlying population is normally distributed
(2) Sample is small and σ is unknown
• Using the sample standard deviation s to replace σ, the t statistic
follows the t – distribution
ns
xt
/
Properties of the t Distribution• mound-shaped• perfectly symmetric
about t=0• more variable than z
(the standard normal distribution)
• affected by the sample size n (as n increases s becomes a better approximation for σ)
• n-1 is the degrees of freedom (d.f.) associated with the t statistic
More on the t Distribution
• Remember the t-distribution is based on the assumption that the sampled population possesses a normal probability distribution.– This is a very restrictive assumption.
• Fortunately, it can be shown that for non-normal but mound-shaped distributions, the distribution of the t statistic is nearly the same shape as the theoretical t-distribution for a normal distribution.
• Therefore the t distribution is still useful for small sample estimation of a population mean even if the underlying distribution of x is not known to be normal
How to use the t-distribution table
• The t-distribution table is in the book (Appendix II, Table 4, pp611). tα is the value of t such that an area α lies to its right.
To use the table:• Determine the degrees of freedom• Determine the appropriate value of α
Lookup the value for tα
Table: t Distribution
The Difference Between Two Means
• Suppose independent samples of n1 and n2 observations have been selected from populations with means , and variances ,
• The Sampling Distribution of the difference in means ( ) will have the following properties
1 221 2
2
21 xx
The Difference Between Two Means
1. The mean and standard deviation of is
2. If the sampled populations are normally distributed, the sampling distribution of ( ) is exactly normally distributed regardless of n
3. If the sampled populations are not normally distributed, the sampling distribution of ( ) is approximately normally distributed when n1 and n2 are large
21 xx
21 xx
21)( 21 xx
2
22
1
21
)( 21 nnxx
21 xx
Point Estimation of the Difference Between Two Means
• Point Estimator:
• A confidence interval for ( ) is
2121 xx
%100*)1( 21
2
22
1
21
2/21 )(nn
zxx
Difference Between Two Means (small sample)
• If n1 and n2 are small then the t statistic
is distributed according to the t distribution if the following assumptions are satisfied:
1. Both samples are drawn from populations with a normal distribution
2. Both populations have equal variances
21
2121
11
)()(
nns
xxt
Difference Between Two Means (small sample)
• In practice, the t statistic is still appropriate even if the underlying distributions are not exactly normally distributed.
• To compute s, we can pool the information from both samples:
or 2
)()(
21
1
222
1
211
2
21
nn
xxxxs
n
ii
n
ii
)1()1(
)1()1(
21
222
2112
nn
snsns
Difference Between Two Means (small sample)
• Point Estimate:
• Interval Estimate:a confidence interval for is
Where s is computed using the pooled estimate described earlier
2121 xx
212/21
11)(
nnstxx
21 %100*)1(
Sampling Distribution of Sample Proportions
• Recall from Chapter 6:– If a random sample of n objects is selected from the
population and if x of these possess a chararacteristic of interest, the sample proportion is
– The sampling distribution of will have a mean and standard deviation
nxp /ˆ
pp^
n
pqp^
p̂
Estimators for p
Assuming n is sufficiently large and the interval lies in the interval from 0 to 1, the:
• Point Estimator for p:
• Interval Estimator for p:
A confidence interval for p is
nxp /ˆ
n
qpzp
ˆˆˆ 2/
%100*)1(
pp ˆˆ 2
Estimating the Difference Between Two Binomial Proportions
• Point estimate
• Confidence interval for the difference
)ˆˆ()( 2121 pppp
2
22
1
112/21
ˆˆˆˆ()ˆˆ(
n
qp
n
qpzpp
Choosing Sample Size
• How many measurements should be included in the sample? – Increasing n increases the precision of the estimate,
but increasing n is costly
• Answer depends on:– What level of confidence do you want to have (i.e.,
the value of 100(1- α )?– What is the maximum difference (B) you want to
permit between the estimate of the population parameter and the true population parameter
Choosing Sample Size
• Once you have chosen B and α, you can solve the following equation for sample size n:
• If the resulting value of n is less than 30 and an estimate
D estimator) theoferror (standard2/ z
Choosing Sample Size