30
Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd http://survey-design.com.au Copyright © 2000

Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd Copyright ©

Embed Size (px)

Citation preview

Page 1: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Basic Sampling Theory for Simple and Cluster Samples

Malcolm Rosier

Survey Design and Analysis Services Pty Ltd

http://survey-design.com.au

Copyright © 2000

Page 2: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Sample design

• The focus of the design for a sample must be on the magnitude of the standard errors of sampling not than on an arbitrary percentage of the target population.

• The standard errors are used to calculate confidence intervals around the sample data.

Page 3: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Standard errors

The next sequence aims to explain standard errors, and how they relate to the underlying target population and a sample drawn from this population.

Page 4: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Graph: Target population

• Population: mean = , standard deviation =

Pro

ba

bili

ty d

en

sity

(p

op

ula

tion

)

Scores-4.0 -2.0 0.0 2.0 4.0

0.00

0.20

0.40

Page 5: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Graph: Sample

• Sample: mean = x, standard deviation = s

Pro

ba

bili

ty d

en

sity

(sa

mp

le)

Scores-4.0 -2.0 0.0 2.0 4.0

0.00

0.20

0.40

Page 6: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Graph: Means from many samples

• However we could get many different samples with different sample means from the population.

Pro

ba

bili

ty d

en

sity

(sa

mp

le)

Scores-4.0 -2.0 0.0 2.0 4.0

0.00

0.20

0.40

Page 7: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Graph: Distribution of sample means

• This gives us a sampling distribution of sample means:

Pro

ba

bili

ty d

en

sity

(m

ea

ns)

Sample means-4.0 -2.0 0.0 2.0 4.0

0.00

0.20

0.40

Page 8: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Sampling distribution of sample means

• normal distribution

• mean = = mean of underlying population distribution

• standard deviation = / n

Page 9: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Standard error of a population mean

The standard deviation of the sampling distribution of sample means is termed the standard error of a mean.

standard error of population mean = / n

Page 10: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Central limit theorem

The central limit theorem states that the link between the mean of one sample and the population mean is given by:

x = z. se(popn mean)

If z = 1.96 we produce a confidence interval where we find 95% of the sample means, relative to

Page 11: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Estimated population mean

However, we are not interested in finding the sample means given the population mean.

Our aim is to locate the population mean given what we know about the sample.

Page 12: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Estimated population mean

We start with a simple random sample and assume:

• s is a good estimate of

• se(sample mean) is a good estimate of se(popn mean)

Page 13: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Estimated population mean

Then instead of

x = z. se(popn mean)

we can write

= x z. se(sample mean)

where

se(sample mean) = s / n

Page 14: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Standard error of a proportion (srs)

The standard error of a percentage (proportion) is:

se(prop) = [p(1-p)/n]

Page 15: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Standard error of a proportion = 0.50 (srs)

For p=0.50

se(p50) = [0.50(1-0.50)/n]

The standard error may be multiplied by a finite population correction (FPC) of (N-n)/N

Page 16: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Confidence intervals

• Confidence intervals are usually expressed at the 95 per cent level (1.96 standard errors of sampling for a proportion)

Page 17: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Table: Effect of sample size on standard error

Page 18: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Two stage samples

• The most efficient method is usually sampling at the first stage with probability proportional to size (pps).

• This produces a self-weighting sample.

• Easier logistics for administration.

Page 19: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Two stage samples

Stage 1

Primary sampling units (psu) are selected with a probability proportional to the size of the target population in the psu.

Example of psu: schools

Page 20: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Two stage samples

Stage 2

A random cluster of secondary sampling units (ssu) is selected at random from each of the psu.

Example of ssu: students in schools

Page 21: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Deff

• Two-stage sampling is less efficient than a simple random sample (srs) of the same size.

deff = (standard error of sampling for complex sample)2 / (standard error of sampling for srs)2

Page 22: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Deft

• The square root of deff is deft, which gives the ratio of the standard errors of sampling.

deft = (standard error of sampling for complex sample) / (standard error of sampling for srs)

Page 23: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Simple equivalent sample

• The simple equivalent sample (ses) is the size of a simple random sample which has the same standard error as the complex sample.

• We sometime use the term effective sample (neff)

Page 24: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Simple equivalent sample

The size of simple equivalent sample = size of complex sample / deff

deff = 1 + (rho)(b-1) = 1 + (0.10)(20-1) = 2.9

where

rho = intraclass correlation

b = cluster size

Page 25: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Compare srs and ses

For a simple random sample of n=1000, the 95 per cent confidence interval is given by:

= 1.96 se(p50)

= 1.96 [0.50(1-0.50)/1000]

= 0.031 = 3.1%

Page 26: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Compare srs and ses

For a simple equivalent sample of n=345 (corresponding to a complex sample of n=1000), the 95 per cent confidence interval is given by:

= 1.96 se(p50)

= 1.96 [0.50(1-0.50)/345]

= 0.053 = 5.3%

Page 27: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Table: Values for deff and simple equivalent sample

Page 28: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Random clusters

• PPS sampling assumes a random cluster which is the same size for each ssu.

• In practice, we often draw an intact group at random.

• This usually increase the intraclass correlation for the sample.

Page 29: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Weighting

• Achieved samples are unlikely to properly represent the proportions of persons in the target populations for the strata.

• Weights are applied so that the achieved sample for each stratum represents its proportion in the total target population.

Page 30: Basic Sampling Theory for Simple and Cluster Samples Malcolm Rosier Survey Design and Analysis Services Pty Ltd  Copyright ©

Weighting

wh = Nh/nh

where

nh = the size of the achieved sample for the stratum

Nh = the size of the target population for the stratum