Estimation of Means and Proportions. Concepts Estimator: a rule that tells us how to estimate a value for a population parameter using sample data Estimate:

Estimation of Means and Proportions

Concepts

• Estimator: a rule that tells us how to estimate a value for a population parameter using sample data

• Estimate: a specific value of an estimator for particular sample data

Concepts

• A point estimator is a rule that tells us how to calculate a particular number from sample data to estimate a population parameter

• An interval estimator is a rule that tells us how to calculate two numbers based on sample data, forming a confidence interval within which the parameter is expected to lie

Properties of a Good Estimator

• Unbiasedness: mean of the sampling distribution of the estimator equals the true value of the parameter

• Efficiency: The most efficient estimator among a group of unbiased estimators is the one with the smallest variance

Properties of a Good Estimator

Estimation of a Population Mean

• The CLT suggests that the sample mean may be a good estimator for the population mean. The CLT says that:

– Sampling distribution of sample mean will be approximately normally distributed regardless of the distribution of the sampled population if n is large

– The sample mean is an unbiased estimator

– The standard error of the sample mean is

x

nx

• A point estimator of the population mean is:

• An interval estimator of the population mean is a confidence interval, meaning that the

true population parameter lies within the interval

of the time, where is the z value corresponding to an area in the upper tail of a standard normal distribution


x̂

nzx

2/

2/z

2/

%100*)1(

%100*)1(


• Usually σ (the population standard deviation) is unknown. – If n is large enough (n ≥ 30) then we can

approximate it with the sample standard deviation s.

One Sided Confidence Intervals

• In some cases we may be interested in the probability the population parameter falls above or below a certain value

• Lower One Sided Confidence Interval (LCL): – LCL= (point estimate) –

• Upper One Sided Confidence Interval (UCL):– UCL = (point estimate) +

nz

*

nz

*

Small Sample Estimation of a Population Mean

• If n is large, we can use sample standard deviation s as reliable estimator of population standard deviation – No matter what distribution the population has, sampling

distribution of sample mean is normally distributed

• As the sample size n decreases, the sample standard deviation s becomes a less reliable estimator of the population standard deviation (because we are using less information from the underlying distribution to compute s)

• How do we deal with this issue?

t Distribution

• Assume

(1) The underlying population is normally distributed

(2) Sample is small and σ is unknown

• Using the sample standard deviation s to replace σ, the t statistic

follows the t – distribution

ns

xt

/

Properties of the t Distribution• mound-shaped• perfectly symmetric

about t=0• more variable than z

(the standard normal distribution)

• affected by the sample size n (as n increases s becomes a better approximation for σ)

• n-1 is the degrees of freedom (d.f.) associated with the t statistic

More on the t Distribution

• Remember the t-distribution is based on the assumption that the sampled population possesses a normal probability distribution.– This is a very restrictive assumption.

• Fortunately, it can be shown that for non-normal but mound-shaped distributions, the distribution of the t statistic is nearly the same shape as the theoretical t-distribution for a normal distribution.

• Therefore the t distribution is still useful for small sample estimation of a population mean even if the underlying distribution of x is not known to be normal

How to use the t-distribution table

• The t-distribution table is in the book (Appendix II, Table 4, pp611). tα is the value of t such that an area α lies to its right.

To use the table:• Determine the degrees of freedom• Determine the appropriate value of α

Lookup the value for tα

Table: t Distribution

The Difference Between Two Means

• Suppose independent samples of n1 and n2 observations have been selected from populations with means , and variances ,

• The Sampling Distribution of the difference in means ( ) will have the following properties

1 221 2

2

21 xx

The Difference Between Two Means

1. The mean and standard deviation of is

2. If the sampled populations are normally distributed, the sampling distribution of ( ) is exactly normally distributed regardless of n

3. If the sampled populations are not normally distributed, the sampling distribution of ( ) is approximately normally distributed when n1 and n2 are large

21 xx

21 xx

21)( 21 xx

2

22

1

21

)( 21 nnxx

21 xx

Point Estimation of the Difference Between Two Means

• Point Estimator:

• A confidence interval for ( ) is

2121 xx

%100*)1( 21

2

22

1

21

2/21 )(nn

zxx

Difference Between Two Means (small sample)

• If n1 and n2 are small then the t statistic

is distributed according to the t distribution if the following assumptions are satisfied:

1. Both samples are drawn from populations with a normal distribution

2. Both populations have equal variances

21

2121

11

)()(

nns

xxt


• In practice, the t statistic is still appropriate even if the underlying distributions are not exactly normally distributed.

• To compute s, we can pool the information from both samples:

or 2

)()(

21

1

222

1

211

2

21

nn

xxxxs

n

ii

n

ii

)1()1(

)1()1(

21

222

2112

nn

snsns


• Point Estimate:

• Interval Estimate:a confidence interval for is

Where s is computed using the pooled estimate described earlier

2121 xx

212/21

11)(

nnstxx

21 %100*)1(

Sampling Distribution of Sample Proportions

• Recall from Chapter 6:– If a random sample of n objects is selected from the

population and if x of these possess a chararacteristic of interest, the sample proportion is

– The sampling distribution of will have a mean and standard deviation

nxp /ˆ

pp^

n

pqp^

p̂

Estimators for p

Assuming n is sufficiently large and the interval lies in the interval from 0 to 1, the:

• Point Estimator for p:

• Interval Estimator for p:

A confidence interval for p is

nxp /ˆ

n

qpzp

ˆˆˆ 2/

%100*)1(

pp ˆˆ 2

Estimating the Difference Between Two Binomial Proportions

• Point estimate

• Confidence interval for the difference

)ˆˆ()( 2121 pppp

2

22

1

112/21

ˆˆˆˆ()ˆˆ(

n

qp

n

qpzpp

Choosing Sample Size

• How many measurements should be included in the sample? – Increasing n increases the precision of the estimate,

but increasing n is costly

• Answer depends on:– What level of confidence do you want to have (i.e.,

the value of 100(1- α )?– What is the maximum difference (B) you want to

permit between the estimate of the population parameter and the true population parameter


• Once you have chosen B and α, you can solve the following equation for sample size n:

• If the resulting value of n is less than 30 and an estimate

D estimator) theoferror (standard2/ z


Documents

Estimation of Means and Proportions. Concepts Estimator: a rule that tells us how to estimate a value for a population parameter using sample data Estimate: