SDA 3E Chapter 4

8/4/2019 SDA 3E Chapter 4

1/47

2007 Pearson Education

Chapter 4: Sampling andEstimation


2/47

Need for SamplingVery large populations

Destructive testing

Continuous production process

The objective of sampling is to draw a valid inference about

a population.


3/47

Sample Design Sampling Plan a description of the

approach that will be used to obtain

samples from a population Objectives

Target population

Population frame

Method of sampling

Operational procedures for data collection

Statistical tools for analysis


4/47

Sampling Methods Subjective

Judgment sampling

Convenience sampling

Probabilistic

Simple random sampling every subset of

a given size has an equal chance of beingselected


5/47

PHStat Tool

Random Sample Generator PHStat menu > Sampling > Random

Sample Generator

Enter sample size

Select sampling

method


6/47

Excel Data Analysis Tool

Sampling Excel menu > Tools > Data Analysis >

Sampling

Specify input rangeof data

Choose sampling

method

Select output option


7/47

Other Sampling Methods Systematic sampling

Stratified sampling

Cluster sampling

Sampling from a continuous process


8/47

Errors in Sampling Nonsampling error

Poor sample design

Sampling (statistical) error

Depends on sample size

Tradeoff between cost of sampling and

accuracy of estimates obtained bysampling


9/47

Estimation Estimation assessing the value of a

population parameter using sample data.

Point estimate a single number used toestimate a population parameter

Confidence intervals a range of valuesbetween which a population parameter isbelieved to be along with the probability thatthe interval correctly estimates the truepopulation parameter


10/47

Common Point Estimates


11/47

Theoretical Issues Unbiased estimator one for which the

expected value equals the population

parameter it is intended to estimate The sample variance is an unbiased

estimator for the population variance

1

2

12

n

xx

s

n

i

i

N

xn

i

i

2

12


12/47

Interval Estimates Range within which we believe the true

population parameter falls

Example: Gallup poll percentage ofvoters favoring a candidate is 56% with a3% margin of error.

Interval estimate is [53%, 59%]


13/47

Confidence Intervals Confidence interval (CI) an interval

estimated that specifies the likelihood that

the interval contains the true populationparameter

Level of confidence (1a) the probabilitythat the CI contains the true population

parameter, usually expressed as a percentage(90%, 95%, 99% are most common).


14/47

Sampling Distribution of the

Mean


15/47

Interval Estimate Containing the

True Population Mean


16/47

Interval Estimate Not Containing

the True Population Mean


17/47

Confidence Interval for the

Mean KnownA 100(1a)% CI is: x za/2(/n)

za/2 may be found from Table A.1 or using theExcel function NORMSINV(1-a/2)


18/47

Example Compute a 95 percent confidence interval for

the mean number of TV hours/week for the18-24 age group in the file TV Viewing.xls.

Assume that the population standarddeviation is known to be 10.0. The samplemean for the n= 45 observations iscomputed to be 60.16. For a 95 percent CI,

za/2 = 1.96. Therefore, the CI is60.16 1.96(10/45)= 60.16 2.92 or [57.24, 63.08]


19/47

Confidence Interval for the

Mean, UnknownA 100(1a)% CI is: x ta/2,n-1(s/n)

ta/2,n-1is the value from a t-distribution withn-1 degrees of freedom, from Table A.2 or

the Excel function TINV(a, n-1)


20/47

Relationship Between Normal

Distribution and t-distribution

The t-distribution yields larger confidenceintervals for smaller sample sizes.


21/47

Example Compute a 95 percent confidence interval for the

mean number of TV hours/week for the 18-24 agegroup in the file TV Viewing.xls. Assume that the

population standard deviation is not but estimatedfrom the sample as 10.095. A 95 percent CIcorresponds to a/2 = 0.025. With 45 observations,thus the t-distribution has 45 - 1 = 44 df. Using TableA.2, we find that t0.025, 44 = 2.0154, yielding a 95

percent CI for the mean of60.16 2.0154(10.095/45)= 60.16 3.03 or [57.13, 63.19]


22/47

PHStat Tool: Confidence

Intervals for the Mean PHStatmenu > Confidence Intervals>

Estimate for the mean, sigma known,

or Estimate for the mean, sigmaunknown


23/47


Intervals for the Mean - Dialog

Enter the confidence level

Choose specification ofsample statistics

Check Finite PopulationCorrection box ifappropriate


24/47

Sampling From Finite

Populations When n > 0.05N, use a correction

factor in computing the standard error:

1

N

nN

nx


25/47


Intervals for the Mean - Results


26/47

Confidence Intervals for

Proportions Sample proportion: p = x/n

x = number in sample having desired

characteristic n = sample size

The sampling distribution of p has meanp and variance p(1p)/n

When np and n(1p) are at least 5,the sampling distribution of p approacha normal distribution


27/47


Proportions

A 100(1

a)% CI is: np)-p(1

zp/2a

PHStattool is available under ConfidenceIntervalsoption


28/47

Confidence Intervals and

Sample Size CI for the mean, known

Sample size needed for half-width of at

most E is n (za/2)2

(2

)/E2

CI for a proportion Sample size needed for half-width of at

most E is

Use p as an estimate ofp or 0.5 for themost conservative estimate

2

2

2/)1()(

E

z

n

ppa


29/47

PHStat Tool: Sample Size

Determination PHStatmenu > Sample Size>

Determination for the Meanor

Determination for the Proportion

Enter s, E, and

confidence level

Check FinitePopulation Correction

box if appropriate


30/47


Population Total

A 100(1

a)% CI is:

PHStattool is available under ConfidenceIntervalsoption

Nx tn-1,a/2 1

N

nN

n

sN


31/47


Differences Between MeansPopulation 1 Population 2

Mean 1

2

Standard

deviation

1

2

Point estimate x1 x2Sample size n1 n2

Point estimate for the difference in means,12, is given by x1 - x2


32/47

Independent Samples With

Unequal Variances

A 100(1

a)% CI is:x1 -x2 (ta/2, df*) 2

2

2

1

2

1

n

s

n

s

1

)/(

1

)/(

2

2

2

2

2

1

2

1

2

1

2

2

2

2

1

2

1

n

ns

n

ns

n

s

n

s

df* = Fractional valuesrounded down


33/47

Example In theAccounting Professionals.xlsworksheet,

find a 95 percent confidence interval for the

difference in years of service between males andfemales.


34/47

Calculations s1= 4.39 and n1= 14 (females),

s2= 8.39 and n2= 13 (males)

df* = 17.81, so use 17 as the degreesof freedom


35/47

Independent Samples With

Equal Variances

A 100(1

a)% CI is:x

1- x

2

(ta/2, n1 + n22

)21

11

nnsp

2

)1()1(

21

2

22

2

11

nn

snsn

sp

where spis a common pooled standard deviation. Mustassume the variances of the two populations are equal.


36/47

Example: Accounting

Professionals


37/47

Paired Samples

A 100(1a)% CI is: D (tn-1,a/2) sD/n

1

)(1

n

DD

s

n

i

i

D

Di = difference for each pair of observations

D = average of differences

PHStattool available in theConfidence Intervalsmenu

2


38/47

Example Pile Foundation.xls

A 95% CI for the average differencebetween the actual and estimated pilelengths is


39/47

Differences Between

Proportions

A 100(1a)% CI is:2

22

1

11

2/21

)1()1(

n

pp

n

ppzpp

a

Applies when nipi and ni(1 pi) are greater than 5


40/47

Example In theAccounting Professionals.xls

worksheet, the proportion of females having

a CPA is 8/14 = 0.57, while the proportion ofmales having a CPA is 6/13 = 0.46. A 95percent confidence interval for the differencein proportions between females and males is


41/47

Sampling Distribution of s The sample standard deviation, s, is a point

estimate for the population standard

deviation, The sampling distribution of s has a chi-

square (c2) distribution with n-1 df See Table A.3

CHIDIST(x, deg_freedom) returns probability tothe right of x

CHIINV(probability, deg_freedom) returns thevalue of x for a specified right-tail probability


42/47

Confidence Intervals for the

Variance

A 100(1a)% CI is:

2

2/1,1

2

2

2/,1

2)1(

,)1(

aa cc nn

snsn

Note the difference in thedenominators!


43/47

PHStatTool: Confidence

Intervals for Variance - Dialog PHStatmenu > Confidence Intervals>

Estimate for the Population Variance

Enter sample size,standard deviation,

and confidence level


44/47

PHStatTool: Confidence

Intervals for Variance - Results


45/47

Time Series Data Confidence intervals only make sense

for stationary time series data


46/47

Summary and ConclusionsAs the confidence level (1 - a)

increases, the width of the confidenceinterval also increases.

As the sample size increases, the widthof the confidence interval decreases.


47/47

Probability IntervalsA 100(1a)% probability interval for a

random variable X is any interval [a,b]

such that P(a X b) = 1a Do not confuse a confidence interval

with a probability interval; confidence

intervals are probability intervals forsampling distributions, not for thedistribution of the random variable.

Documents

SDA 3E Chapter 4