Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Chapter 3: Element sampling design: Part 1

Jae-Kwang Kim

Fall, 2014

Simple random sampling

1 Simple random sampling

2 SRS with replacement

3 Systematic sampling

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 2 / 31


Simple Random Sampling

Motivation: Choose n units from N units without replacement.1 Each subset of n distinct units is equally likely to be selected.2 There are

(Nn

)samples of size n from N.

3 Give equal probability of selection to each subset with n units.

Definition

Sampling design for SRS:

P(A) =

1/(N

n

)if |A| = n

0 otherwise.



Lemma

Under SRS, the inclusion probabilities are

πi = n/N

πij =n (n − 1)

N (N − 1)for i 6= j .



Theorem

Under SRS design, the HT estimator

YHT =N

n

∑i∈A

yi = Ny

is unbiased for Y and has variance of the form

V(YHT

)=

N2

n

(1− n

N

)S2

where

S2 =1

2

1

N

1

N − 1

N∑i=1

N∑j=1

(yi − yj)2 =

1

N − 1

N∑i=1

(yi − Y

)2.



Theorem (Cont’d)

Also, the SYG variance estimator is

V(YHT

)=

N2

n

(1− n

N

)s2

where

s2 =1

n − 1

∑i∈A

(yi − y)2 .

Thus, under SRSE (s2) = S2.



Remark (under SRS)

1− n/N is often called the finite population correction (FPC) term.The FPC term can be ignored (FPC

.= 1) if the sampling rate n/N is

small (≤ 0.05) or for conservative inference.

For n = 1, the variance of the sample mean is

1

n

(1− n

N

)S2 =

1

N

N∑i=1

(yi − Y

)2 ≡ σ2YCentral limit theorem: under some conditions,

V−1/2(YHT − Y

)=

y − Y√1n

(1− n

N

)S2→ N (0, 1) .



Remark (under SRS)

Sample size determination1 Choose the target variance V ∗ of V (y).2 Choose n the smallest integer satisfying

1

n

(1− n

N

)S2 ≤ V ∗.

For dichotomous y (taking 0 or 1), may use S2 .= P(1− P) ≤ 1/4. A

simple rule is n ≥ d−2, where d is the margin of error.



How to select a simple random sample of size n from thefinite population ?

Draw-by-draw procedure

Rejective Bernoulli sampling method

Sample Reservoir method

Random sorting method



Draw-by-draw procedure

For example, consider U = {1, 2, · · · ,N} and n = 2.

In the first draw, select one element with equal probability.

In the second draw, select one element with equal probability fromU − {a1} where a1 is the element selected from the first draw. Let a2be the element selected from the second draw.



Draw-by-draw procedure (Cont’d)

P(a1, a2) = P(a1)P(a2 | U − {a1}) + P(a2)P(a1 | U − {a2})

=

=2

N(N − 1).

We can prove similar results for general n. (Use mathematical induction).



Rejective Bernoulli sampling method

1 Apply Bernoulli sampling of expected size n.

I1, · · · , IN ∼ Bernoulli(f )

where f = n/N.

2 Check if the realized sample size is n. If yes, accept the sample.Otherwise, goto Step 1.



Rejective Bernoulli sampling method (Cont’d)

Justification:

P

(I1, I2, · · · , IN |

N∑i=1

Ii = n

)=

∏Ni=1 f

Ii (1− f )1−Ii(Nn

)f n(1− f )N−n

=1(Nn

)if∑N

i=1 Ii = n.



Reservoir method (McLeod and Bellhouse, 1983)

1 The first n units are selected into the sample.2 For each k = n + 1, · · · ,N:

1 Select k with probability n/k .2 If unit k is selected, remove one element from the current sample with

equal probability.3 Unit k takes the place of the removed unit.

Note that the population size is not necessarily known. You can stop anytime point of the process then you will obtain a simple random samplefrom the finite population considered up to that time point.



Random sorting method

1 A value of an independent uniform variable in [0,1] is allocated toeach unit of the population.

2 The population is sorted in ascending (or descending) order.

3 The first n units of the sorted population are selected in the sample.


SRS with replacement






In with-replacement sampling, order of the sample selection is important.

Ordered sampleOS = (a1, a2, · · · , an)

where ai is the index of the element in the i-th with-replacementsampling.

Sample: A = {k ; k = ai for some i , i = 1, 2, · · · ,m}SRS with replacement: For each i-th draw, we use

ai = k with probability 1/N, k = 1, · · · ,N.

Sample size is random variable: Note that

πk = Pr (k ∈ A)

= 1− Pr (k /∈ A)

= 1−(

1− 1

N

)n

Thus, n0 =∑N

k=1 πk = N − N(1− N−1

)n ≤ n for n > 2.



1 First, define

Zi = yai =N∑

k=1

yk I (ai = k) .

Note that Z1, · · · ,Zn are independent random variables since the ndraws are independent.

2 Z1, · · · ,Zm are identically distributed since the same probabilities areused at each draw, where E (Zi ) = Y and

V (Zi ) = N−1N∑

k=1

(yk − Y

)2 ≡ σ2y .3 Thus, Z1, · · · ,Zm are IID with mean Y and variance σ2y . Use

z =∑n

k=1 Zk/n to estimate Y .



Estimation of Total

Unbiased estimator of Y :

YSRSWR =N

n

n∑i=1

yai = Nyn.

Variance

V(YSRSWR

)=

N2

n

(1− 1

N

)S2 =

N2

nσ2y ≥ V (YSRS)

where S2 = (N − 1)−1∑N

i=1(yi − YN)2 = N(N − 1)−1σ2y .

Variance estimation

V(YSRSWR

)=

N2

ns2

where s2 = (n − 1)−1∑n

i=1(yai − yn)2. Note that E (s2) = σ2y .


Systematic sampling





Systematic sampling

Setup:1 Have N elements in a list.2 Choose a positive integer, a, called sampling interval. Let n = [N/a].

That is, N = na + c , where c is an integer 0 ≤ c < a.3 Select a random start, r , from {1, 2, · · · , a} with equal probability.4 The final sample is

A = {r , r + a, r + 2a, · · · , r + (n − 1)a} , if c < r ≤ a

= {r , r + a, r + 2a, · · · , r + na} , if 1 ≤ r ≤ c .


Systematic sampling

Sample size can be random

nA =

{n if c < r ≤ an + 1 if r ≤ c

Inclusion probabilities

πk =

πkl =


Systematic sampling

Remark

This is very easy to do.

This is a probability sampling design.

This is not measurable sampling design: No design-unbiasedestimator of variance (because only one random draw)

Pick one set of elements (which always go together) & measure eachone: Later, we will call this cluster sampling.

Divide population into non-overlapping groups & choose an elementin each group: closely related to stratification.


Systematic sampling

Estimation

Partition the population into a groups

U = U1 ∪ U2 ∪ · · · ∪ Ua

where Ui : disjoint

Population total

Y =∑i∈U

yi =a∑

r=1

∑k∈Ur

yk =a∑

r=1

tr

where tr =∑

k∈Uryk .

Think of finite population with a elements with measurementst1, · · · , ta.


Systematic sampling

Estimation (Cont’d)

HT estimator:

YHT =tr

1/a,

if A = Ur .

Variance: Note that we are doing SRS from the population of aelements {t1, · · · , ta}.

Var(YHT

)=

a2

1

(1− 1

a

)S2t

where

S2t =

1

a− 1

a∑r=1

(tr − t)2

and t =∑a

r=1 tr/a.

When the variance is small ?


Systematic sampling

Estimation (Cont’d)

Now, assuming N = na

V(YHT

)= a (a− 1)S2

t

= n2aa∑

r=1

(yr − yu)2

where yr = tr/n and yu = t/n.

ANOVA: U = ∪ar=1Ur

SST =∑k∈U

(yk − yu)2 =a∑

r=1

∑k∈Ur

(yk − yu)2

=a∑

r=1

∑k∈Ur

(yk − yr )2 + na∑

r=1

(yr − yu)2

= SSW + SSB.


Systematic sampling

V(YHT

)= na · SSB = N · SSB = N (SST − SSW ) .

If SSB is small, then yr are more alike and V(YHT

)is small.

If SSW is small, then V(YHT

)is large.

Intraclass correlation coefficient ρ measures homogeniety of clusters.

ρ = 1− n

n − 1

SSW

SST

More details about ρ will be covered in the cluster sampling.(Chapter 6).


Systematic sampling

Comparison between systematic sampling (SY) and SRS

How does SY compare to SRS when the population is sorted by thefollowing way ?

1 Random ordering: Intuitively should be the same2 Linear ordering: SY should be better than SRS3 Periodic ordering: if period = a, SY can be terrible.4 Autocorrelated order: Successive yk ’s tend to lie on the same side of

yu. Thus, SY should be better than SRS.


Systematic sampling

How to quantify ? :

VSRS

(YHT

)=

N2

n

(1− n

N

) 1

N − 1

N∑k=1

(yk − YN

)2VSY

(YHT

)= n2a

a∑r=1

(yr − yu)2

Cochran (1946) introduced superpopulation model to deal with thisproblem. (treat yk as a random variable)


Systematic sampling

Example: Superpopulation model for a population in random order.Denote the model by ζ: {yk} iid

(µ, σ2

)Eζ

{VSRS

(YHT

)}=

N2

n

(1− n

N

)σ2

Eζ

{VSY

(YHT

)}=

N2

n

(1− n

N

)σ2

Thus, the model expectations of the design variances are the sameunder the IID model.


Documents

Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling