90
Ratio estimation with stratified samples • Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose that the population data of 1987 are available. How can we combine the two techniques? 1

Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

Embed Size (px)

Citation preview

Page 1: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

1

Ratio estimation with stratified samples

• Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose that the population data of 1987 are available. How can we combine the two techniques?

Page 2: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

2

Method 1: combined ratio estimator

• Step 1: combine strata to estimate tx and ty

• Step 2: use ratio estimation

H

h Si iSi ihH

h xhyhstrxstry

strxstrystrxstrystrx

stryyrc

strx

stryxyrc

hhxyvoC

n

NttvoCttvoC

ttvoCBtraVBtraVt

ttraV

t

tBtBt

1 2

2

1,,

,,,2

,

2

,

,

,

,

),(ˆ)ˆ,ˆ(ˆ)ˆ,ˆ(ˆ where

)]ˆ,ˆ(ˆˆ2)ˆ(ˆˆ)ˆ(ˆ[ˆ

ˆ)ˆ(ˆ

ˆ

ˆˆ where,ˆˆ

H

h xhstrx

H

h yhstry tttt1,1,ˆˆ,ˆˆ

Page 3: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

3

Method 2: separate ratio estimators

• Step 1: use ratio estimation in each stratum

• Step 2: combine strata

H

hxh

yhxh

H

h yhryrs t

tttt

11 ˆ

ˆˆˆ

H

h yhryrs traVtraV1

)ˆ(ˆ)ˆ(ˆ

Page 4: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

4

Method 1 vs Method 2

• If the ratios vary from stratum to stratum, use method 2

• If sample sizes are small, use method 1

• Poststratificatio is a special case of method 2

Page 5: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

5

Cluster Sampling

Page 6: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

6

A new sampling method

• Motivating example• Want to study the average amount water used

by per person• How would you design a survey?

Page 7: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

7

A new sampling method

• Consider the two strategies– Sample person by person– Sample household by household

• Which one do you prefer and why?

Page 8: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

8

A new sampling method

• In the water usage example, I would sample households, in other words, I would use household as the sampling unit.

• I do this for convenience. I am interested in average monthly usage per person, but I sample household

Page 9: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

9

A new sampling method

• The example of water usage is an example of cluster sampling– Households are the primary sampling units (PSUs)

or clusters– Persons are the secondary sampling units (SSUs).

They are the elements in the population

Page 10: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

10

Definition of Cluster Sampling

• Take an SRS on clusters• Individual elements of the population are

allowed in the sample only if they belong to a cluster (primary sampling unit) that is included in the sample

• The sampling unit (psu) is not the same as the observation unit (ssu), and the two sizes of experimental units must be considered when calculating standard errors from cluster samples

Page 11: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

11

Stratified sampling vs Cluster sampling

• The two sampling methods look similar– A cluster is also a grouping of elements of the

population• But the sampling schemes are different– Stratified: SRS from each stratum– Cluster: SRS of the clusters. For each selected

cluster, we select all its elements– See the following two slides

Page 12: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

12

Stratified sampling

Page 13: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

13

Cluster sampling

Page 14: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

14

Stratified sampling vs Cluster sampling

• Stratified sampling– Variance of the estimate of depends on the

variability of values within strata– For greater precision, individual elements within

each stratum should be similar values, but stratum means should differ from each other as much as possible

– Stratified sampling usually improves the precision of SRS

Uy

Page 15: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

15

Stratified sampling vs Cluster sampling

• Cluster sampling– The cluster is the sampling unit– The more clusters we sample, the smaller the

variance– The variance of the estimate of depends

primarily on the variability between cluster means– For greater precision, individual elements with each

cluster should be heterogeneous and cluster means should be similar to one another

– Cluster sampling usually ??? the precision of SRS

Uy

Page 16: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

16

Why does cluster sampling tend to reduce precision?

• Elements of the same cluster tend to be more similar than elements selected at random from the whole population. E.g, – Elements of the same household tend to have similar political views– Fish in the same lake tend to have similar concentrations of mercury– Residents of the same nursing home tend to have similar opinions of

the quality of care

• The similarities arise because of some underlying factors that may or may not be measurable– Residents of the same nursing home may have similar opinions

because the care is poor– The concentration of mercury in the fish will reflect the concentration

of mercury in the lake

Page 17: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

17

Why does cluster sampling tend to reduce precision?

• Because of the similarities of elements within clusters, we do not obtain as much information

• By sampling everyone in the cluster, we partially repeat the same information instead of obtaining new information

• As a result, cluster sampling leads to less precision for estimates of population quantities

Page 18: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

18

Motivation of using cluster sampling

• A sampling frame list of observation units may be difficult, expensive, or unavailable– Cannot list all honeybees in a region

• The population may be widely distributed geographically or may occur in nature clusters– Nursing home residents cluster in nursing homes

• Cluster sampling leads to convenience and reduced cost

• Cluster sampling may result in more information per dollar spent

Page 19: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

19

Versions of cluster sampling: one-stage vs two-stage cluster sampling

• We will consider one-stage and two-stage sampling– One-stage sampling: every element within a

sampled cluster is included in the sample– Two-stage sampling: we subsample only some of

the elements of selected clusters

Page 20: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

20

One-stage cluster sampling(1)

(2) (3)

Page 21: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

21

Two-stage cluster sampling(1)

(2) (3)

Page 22: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

22

Notation for cluster sampling

Page 23: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

23

Notation for cluster sampling

Page 24: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

24

Notation for cluster sampling

Page 25: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

25

Notation for cluster sampling

Page 26: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

26

One-stage cluster sampling(1)

(2) (3)

Page 27: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

27

One-stage cluster sampling

• Every element within a cluster (PSU) is included in the sample

• Either “all” or “none” of the elements that compose a cluster (PSU) are in the sample

• iiiiiUiii ttsSyymM ˆ,,, 22

Page 28: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

28

Clusters of equal sizes

• – Most naturally occurring clusters do not fit into

this framework– Can occur in agricultural and industrial sampling– Estimating population means or totals is simple• We treat the cluster means or totals as the

observations and simply ignore the individual elements• We have an SRS of n observations , where ti

is the total for all the elements in PSU i. },{ Siti

Page 29: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

29

Clusters of equal sizes

Page 30: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

30

Clusters of equal sizes

Nothing is new here

Page 31: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

31

Clusters of equal sizes: an example

Page 32: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

32

Clusters of equal sizes: an example

Page 33: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

33

Clusters of equal sizes: sampling weights

Page 34: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

34

Theory of Cluster sampling with equal sizes

Page 35: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

35

Theory of Cluster sampling with equal sizes

• In one-stage cluster sampling, the variability of the unbiased estimator of t depends entirely on the between-cluster part of the variability

• For cluster sampling

Page 36: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

36

Theory of Cluster sampling with equal sizes

• When MSB/MSW is large– MSB is relatively large: elements in different clusters

vary more than elements in the same cluster– cluster sampling is less precise than SRS

• If MSB>S^2, cluster sampling is less precise

Page 37: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

37

Page 38: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

38

Measurements of correlation

• ICC (or ρ): Intraclass (or intracluster) Correaltion Coefficient– Describes how similar elements in the same

cluster are– Provides a measure of homogeneity within the

clusters• Definition:• It can be shown that

Page 39: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

39

Measurements of correlation

If SSB=0, then

Page 40: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

40

One-stage cluster sampling with equal sizes vs SRS

If N is large

1+(M-1)ICC SSU’s, taken in a one-stage cluster sample, giveThe same amount of information as one SSU from an SRSe.g, ICC=1/2, M=5, then 1+(M-1)ICC=3 → 300 SSUs in the cluster sample = 100 SSUs in an SRS

• If ICC<0, cluster sampling is more efficient than SRS • ICC is rarely negative in naturally occurring clusters

Page 41: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

41

The GPA example

The population ANOVA table (estimated)

Page 42: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

42

The GPA exampleThe population ANOVA table (estimated)

• The sample mean square total should not be used to estimate when n is small

• The data were collected as a cluster sample. They do not reflect enough of the cluster-to-cluster variability.

• Multiply the unbiased estimates of MSB and MSW by the df from the population ANOVA table to estimate the population sums of squares

Page 43: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

43

The GPA example

The population ANOVA table (estimated)

Page 44: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

44

The GPA example

Page 45: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

45

Clusters of unequal sizes

• The adjusted R2 measures the relative amount of variability in the population explained by the cluster means, adjusted fro the number of degrees of freedom

• If the clusters are homogeneous, then the cluster means are highly variable relative to the variation within cluster, and R2 will be high.

Page 46: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

46

An example

Page 47: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

47

An example

Page 48: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

48

The GPA example

Page 49: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

49

The GPA exampleThe population ANOVA table (estimated)

Page 50: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

50

Clusters of unequal sizes

• In social surveys, clusters are usually of equal sizes

• In a one-stage sample, we will introduce two methods to estimate the population total/mean– Unbiased estimation– Ratio estimation

Page 51: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

51

Unbiased estimation for cluster sampling with unequal sizes

Page 52: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

52

Unbiased estimation for cluster sampling with unequal sizes

• Nothing is different from cluster sampling with equal sizes

• The problem is that the between cluster variance is large when the sizes of clusters are quite different from each other, as we expect large total from clusters of large sizes

• Therefore, we consider another estimator

Page 53: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

53

Ratio estimation for cluster sampling with unequal sizes

Page 54: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

54

Ratio estimation for cluster sampling with unequal sizes

where

Page 55: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

55

Ratio estimation for cluster sampling with unequal sizes

Note, it is not difficult to find that

• The variance of the ratio estimator depends on the variability of the means per element in the clusters

• It can be much smaller than that of the unbiased estimator• The ratio estimator requires the total number of elements in the

population, K.• The unbiased estimator does not require K.

Page 56: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

56

Two-stage cluster sampling

• In one-stage cluster sampling, we – Examine all the SSU’s within the selected PSU’s– Obtain redundant information because SSU’s in a

PSU tend to be similar– Expensive

• An alternative: taking a subsample within each selected PSU – two stage cluster sampling

Page 57: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

57

Two-stage cluster sampling with equal probability

Page 58: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

58

Two-stage cluster sampling with equal probability

• Compared with the one-stage cluster sampling, the two-stage uses one extra stage.

• The extra stage complicates the notation and estimators, as one needs to consider variability arising from both stages of data collection

• The points estimates are similar to those in one-stage, but variances are much more complicated

Page 59: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

59

Two-stage cluster sampling with equal probability: an unbiased estimator

• Since we do not observe every SSU in the sampled PSU’s, we need to estimate the totals for the sampled PSU’s

• An unbiased estimator of the population total is

Page 60: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

60

Two-stage cluster sampling with equal probability: an unbiased estimator

• The estimator is unbiased

t

tN

n

n

NtEZE

n

N

tZn

NEt

n

NEtE

N

i i

N

i ii

N

i iiSiiunb

11

1

]ˆ[][

]ˆ[]ˆ[]ˆ[

Page 61: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

61

Two-stage cluster sampling with equal probability: an unbiased estimator

• Because are random variables, the variance of has two components– The variability between PSU’s– The variability within PSU’s

unbt̂

sti 'ˆ

Recall thatVar[Y]=Var[E[Y|X]] + E[Var[Y|X]]

Here ),...,(,ˆ1 Nunb ZZXtY

Page 62: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

62

Two-stage cluster sampling with equal probability: an unbiased estimator

i

iN

ii

ii

t

N

i iit

N

i ii

N

i ii

i

N

i iii

N

i ii

N

i iiSiiunb

m

S

M

mM

n

N

n

S

N

nN

tVarZEn

N

n

S

N

nN

tVarZEn

N

n

tZVarN

ZtZVarEZtZEVarn

N

tZVarn

Nt

n

NVartVar

2

1

22

2

1

222

1

2

12

11

2

1

2

11

]ˆ[][1

]]ˆ[[][

]]|ˆ[[]]|ˆ[[

]ˆ[]ˆ[]ˆ[

Page 63: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

63

Two-stage cluster sampling with equal probability

i

iN

ii

ii

tunb m

S

M

mM

n

N

n

S

N

nNtVar

2

1

22

2 11]ˆ[

Page 64: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

64

Two-stage cluster sampling with equal probability: an unbiased estimator

It can be shown that an unbiased estimator of the variance is

For the population mean

Page 65: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

65

Two-stage cluster sampling with equal probability: a ratio estimator

As in one-stage cluster sampling with unequal sizes, the between-PSU variance can be very large since it is affected both by variations in the cluster sizes and by variation in y.

Page 66: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

66

where

Page 67: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

67

The egg volume example

• A study (Arnold 1991) on egg volume of American coot eggs in Minnesota. We looked at volumes of a subsample of eggs in clutches (nests of eggs) with at least two eggs.

• For each sampled clutch, two eggs were measured

Page 68: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

68

The egg volume example

Page 69: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

69

The egg volume example

Page 70: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

70

The egg volume example

Page 71: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

71

The egg volume example

N is unknown but presumably to be large.

Page 72: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

72

Using weights in cluster samples

• For estimating overall means and totals in cluster samples, most survey statisticians use sampling weights.

• Weights can be used to find a point estimate of almost any quantity of interest

• For cluster sampling:

Page 73: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

73

Using weights in cluster samples

Page 74: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

74

SRS : one-stage cluster: two-stage cluster

• For simplicity, we only consider

• One estimator from each of the three sampling methods

mmmMMM NN ...,... 11

cluster stage-one fromestimator theˆ

cluster stage- twofromestimator unbiased theˆ

SRS fromˆ

1t

t

t

unb

SRS

Page 75: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

75

SRS : one-stage cluster: two-stage cluster

Assume (nm) SSUs are sampled

N

i it

iN

it

unb

Siiunb

SMMm

mM

n

N

n

S

N

nN

m

S

M

mM

n

N

n

S

N

nNtVar

tn

Nt

1

222

2

2

1

22

2

1

11]ˆ[

ˆˆ

Page 76: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

76

SRS : one-stage cluster: two-stage cluster

• Recall that

• Therefore,

MSWMNSSWSMN

i i )1()1(1

2

MSWNMMm

mM

n

NMSB

n

M

N

nN

SMMm

mM

n

N

n

S

N

nNtVar

N

i it

unb

22

1

222

2

1

1]ˆ[

Page 77: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

77

SRS : one-stage cluster: two-stage cluster

• We have defined ICC (ρ)

])1(1[)1(

1)]1)(1([

)1(

1

)1(

)1)(1)(1()1(

1

1)1)(1()1(

1

)1()1(

1

)1(1

11

)1(

)1(

11

11

22

2222

2

2

22

MSMN

NMMMS

MN

NM

MN

SNMMSNMM

N

SNMNM

MNSNM

N

MSWMNSNM

N

SSWSSTOMSB

SNM

NMMSW

S

MSW

NM

NM

SNM

MSWMN

M

M

SSTO

SSW

M

M

Page 78: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

78

SRS : one-stage cluster: two-stage cluster

)1(1)1(1)1(

)1()1(

)1(])1(1[11

)1(

)1(1

])1(1[)1(

11

1]ˆ[

222

2

2

2

2222

22

mSnm

MNmS

nm

NMNM

M

mM

M

mM

M

Mm

M

mS

nm

NMNM

M

mMM

M

m

N

n

N

NS

nm

NMNM

SNM

NMNM

Mm

mM

n

NMS

MN

NM

n

M

N

nN

MSWNMMm

mM

n

NMSB

n

M

N

nNtVar unb

Page 79: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

79

SRS : one-stage cluster: two-stage cluster

• If we use nm SSU’s in a one-stage cluster sampling, #PSU’s=n’=nm/M

])1(1[

])1(1[)1(

11

1

1/

/1]ˆ[

222

22

2

22

22

1

MSnm

MN

MSMN

NM

nm

MM

NM

nmN

MSBnm

MM

NM

nmN

nm

MS

NM

nmN

Mnm

S

N

MnmNtVar tt

Page 80: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

80

SRS : one-stage cluster: two-stage cluster

• If we use nm SSU’s in an SRS

2222

2 1)(]ˆ[ Snm

MN

nm

S

NM

nmNMtVar SRS

Page 81: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

81

SRS : one-stage cluster: two-stage cluster

2222

2 1)(]ˆ[ Snm

MN

nm

S

NM

nmNMtVar SRS

)1(1]ˆ[ 222

mSnm

MNtVar unb

])1(1[]ˆ[ 222

1 MSnm

MNtVar

]ˆ[]ˆ[]ˆ[,0when 1tVartVartVar unbSRS

Page 82: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

82

Design a cluster survey

• It is worth spending a great deal of effort on designing the survey for an expensive and large-scale survey

• It can take several years to design and pre-test• For designing a cluster sample– What overall precision is needed?– What size should the PSU’s be?– How many SSU’s should be sampled in each sampled

PSU?– How many PSU’s should be sampled?

Page 83: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

83

Choosing the PSU size

• In many situations, the PSU size exists naturally. E.g, a clutch of eggs, a household

• In some situations, one needs to choose PSU sizes. E.g., area of a region, 1km2, 2km2,…

• Many ways to “try out” different PSU sizes• Pilot study, perform an experiment• The goal is get the most information for the

least cost and inconvenience

Page 84: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

84

Two-stage cluster design with equal cluster size and equal variance

Page 85: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

85

Two-stage cluster design with equal cluster size and equal variance

11

)1(

)1(

)(

1)(

whenreached is minimum The

constant1)(

21

constant]1)(

[1

111

11]ˆ[)(

1]ˆ[

22

1

2

11

2

12

12

2121

2

a

unbunb

RNMc

NMc

MSWMSBc

MSWMcm

mMSWcm

M

MSWMSBc

mMSWmc

M

MSWMSBc

C

mMSWcm

M

MSWMSBc

C

NM

MSB

Cm

mccMSW

C

mcc

M

MSWMSBNM

MSB

mnMSW

nM

MSWMSB

nM

MSW

nm

MSW

NM

MSB

nM

MSB

nm

MSW

M

m

nM

MSB

N

ntVar

NMyVar

Page 86: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

86

Two-stage cluster design with equal cluster size and equal variance

• Graphing variance of varying m and n gives more information

• It is useful to examine– What if the costs or the cost function are slightly

different?– What if changes slightly?

11

)1(

)1(2

2

1

aRNMc

NMcm

2aR

Page 87: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

87

The GPA example

Page 88: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

88

The GPA example

Page 89: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

89

Summary of two-stage cluster

• Cluster sampling is widely used in large surveys

• Variances from cluster samples are usually greater than SRSs with the same SSUs

• Less expensive – the per-dollar information from cluster sampling might be greater than that of SRS

Page 90: Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose

90

Summary of two-stage cluster