30
Chapter 3: Element sampling design: Part 1 Jae-Kwang Kim Fall, 2014

Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Embed Size (px)

Citation preview

Page 1: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Chapter 3: Element sampling design: Part 1

Jae-Kwang Kim

Fall, 2014

Page 2: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Simple random sampling

1 Simple random sampling

2 SRS with replacement

3 Systematic sampling

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 2 / 31

Page 3: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Simple random sampling

Simple Random Sampling

Motivation: Choose n units from N units without replacement.1 Each subset of n distinct units is equally likely to be selected.2 There are

(Nn

)samples of size n from N.

3 Give equal probability of selection to each subset with n units.

Definition

Sampling design for SRS:

P(A) =

1/(N

n

)if |A| = n

0 otherwise.

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 3 / 31

Page 4: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Simple random sampling

Lemma

Under SRS, the inclusion probabilities are

πi = n/N

πij =n (n − 1)

N (N − 1)for i 6= j .

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 4 / 31

Page 5: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Simple random sampling

Theorem

Under SRS design, the HT estimator

YHT =N

n

∑i∈A

yi = Ny

is unbiased for Y and has variance of the form

V(YHT

)=

N2

n

(1− n

N

)S2

where

S2 =1

2

1

N

1

N − 1

N∑i=1

N∑j=1

(yi − yj)2 =

1

N − 1

N∑i=1

(yi − Y

)2.

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 5 / 31

Page 6: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Simple random sampling

Theorem (Cont’d)

Also, the SYG variance estimator is

V(YHT

)=

N2

n

(1− n

N

)s2

where

s2 =1

n − 1

∑i∈A

(yi − y)2 .

Thus, under SRSE (s2) = S2.

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 6 / 31

Page 7: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Simple random sampling

Remark (under SRS)

1− n/N is often called the finite population correction (FPC) term.The FPC term can be ignored (FPC

.= 1) if the sampling rate n/N is

small (≤ 0.05) or for conservative inference.

For n = 1, the variance of the sample mean is

1

n

(1− n

N

)S2 =

1

N

N∑i=1

(yi − Y

)2 ≡ σ2YCentral limit theorem: under some conditions,

V−1/2(YHT − Y

)=

y − Y√1n

(1− n

N

)S2→ N (0, 1) .

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 7 / 31

Page 8: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Simple random sampling

Remark (under SRS)

Sample size determination1 Choose the target variance V ∗ of V (y).2 Choose n the smallest integer satisfying

1

n

(1− n

N

)S2 ≤ V ∗.

For dichotomous y (taking 0 or 1), may use S2 .= P(1− P) ≤ 1/4. A

simple rule is n ≥ d−2, where d is the margin of error.

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 8 / 31

Page 9: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Simple random sampling

How to select a simple random sample of size n from thefinite population ?

Draw-by-draw procedure

Rejective Bernoulli sampling method

Sample Reservoir method

Random sorting method

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 9 / 31

Page 10: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Simple random sampling

Draw-by-draw procedure

For example, consider U = {1, 2, · · · ,N} and n = 2.

In the first draw, select one element with equal probability.

In the second draw, select one element with equal probability fromU − {a1} where a1 is the element selected from the first draw. Let a2be the element selected from the second draw.

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 10 / 31

Page 11: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Simple random sampling

Draw-by-draw procedure (Cont’d)

P(a1, a2) = P(a1)P(a2 | U − {a1}) + P(a2)P(a1 | U − {a2})

=

=2

N(N − 1).

We can prove similar results for general n. (Use mathematical induction).

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 11 / 31

Page 12: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Simple random sampling

Rejective Bernoulli sampling method

1 Apply Bernoulli sampling of expected size n.

I1, · · · , IN ∼ Bernoulli(f )

where f = n/N.

2 Check if the realized sample size is n. If yes, accept the sample.Otherwise, goto Step 1.

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 12 / 31

Page 13: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Simple random sampling

Rejective Bernoulli sampling method (Cont’d)

Justification:

P

(I1, I2, · · · , IN |

N∑i=1

Ii = n

)=

∏Ni=1 f

Ii (1− f )1−Ii(Nn

)f n(1− f )N−n

=1(Nn

)if∑N

i=1 Ii = n.

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 13 / 31

Page 14: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Simple random sampling

Reservoir method (McLeod and Bellhouse, 1983)

1 The first n units are selected into the sample.2 For each k = n + 1, · · · ,N:

1 Select k with probability n/k .2 If unit k is selected, remove one element from the current sample with

equal probability.3 Unit k takes the place of the removed unit.

Note that the population size is not necessarily known. You can stop anytime point of the process then you will obtain a simple random samplefrom the finite population considered up to that time point.

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 14 / 31

Page 15: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Simple random sampling

Random sorting method

1 A value of an independent uniform variable in [0,1] is allocated toeach unit of the population.

2 The population is sorted in ascending (or descending) order.

3 The first n units of the sorted population are selected in the sample.

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 15 / 31

Page 16: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

SRS with replacement

1 Simple random sampling

2 SRS with replacement

3 Systematic sampling

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 16 / 31

Page 17: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

SRS with replacement

In with-replacement sampling, order of the sample selection is important.

Ordered sampleOS = (a1, a2, · · · , an)

where ai is the index of the element in the i-th with-replacementsampling.

Sample: A = {k ; k = ai for some i , i = 1, 2, · · · ,m}SRS with replacement: For each i-th draw, we use

ai = k with probability 1/N, k = 1, · · · ,N.

Sample size is random variable: Note that

πk = Pr (k ∈ A)

= 1− Pr (k /∈ A)

= 1−(

1− 1

N

)n

Thus, n0 =∑N

k=1 πk = N − N(1− N−1

)n ≤ n for n > 2.

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 17 / 31

Page 18: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

SRS with replacement

1 First, define

Zi = yai =N∑

k=1

yk I (ai = k) .

Note that Z1, · · · ,Zn are independent random variables since the ndraws are independent.

2 Z1, · · · ,Zm are identically distributed since the same probabilities areused at each draw, where E (Zi ) = Y and

V (Zi ) = N−1N∑

k=1

(yk − Y

)2 ≡ σ2y .3 Thus, Z1, · · · ,Zm are IID with mean Y and variance σ2y . Use

z =∑n

k=1 Zk/n to estimate Y .

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 18 / 31

Page 19: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

SRS with replacement

Estimation of Total

Unbiased estimator of Y :

YSRSWR =N

n

n∑i=1

yai = Nyn.

Variance

V(YSRSWR

)=

N2

n

(1− 1

N

)S2 =

N2

nσ2y ≥ V (YSRS)

where S2 = (N − 1)−1∑N

i=1(yi − YN)2 = N(N − 1)−1σ2y .

Variance estimation

V(YSRSWR

)=

N2

ns2

where s2 = (n − 1)−1∑n

i=1(yai − yn)2. Note that E (s2) = σ2y .

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 19 / 31

Page 20: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Systematic sampling

1 Simple random sampling

2 SRS with replacement

3 Systematic sampling

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 20 / 31

Page 21: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Systematic sampling

Setup:1 Have N elements in a list.2 Choose a positive integer, a, called sampling interval. Let n = [N/a].

That is, N = na + c , where c is an integer 0 ≤ c < a.3 Select a random start, r , from {1, 2, · · · , a} with equal probability.4 The final sample is

A = {r , r + a, r + 2a, · · · , r + (n − 1)a} , if c < r ≤ a

= {r , r + a, r + 2a, · · · , r + na} , if 1 ≤ r ≤ c .

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 21 / 31

Page 22: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Systematic sampling

Sample size can be random

nA =

{n if c < r ≤ an + 1 if r ≤ c

Inclusion probabilities

πk =

πkl =

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 22 / 31

Page 23: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Systematic sampling

Remark

This is very easy to do.

This is a probability sampling design.

This is not measurable sampling design: No design-unbiasedestimator of variance (because only one random draw)

Pick one set of elements (which always go together) & measure eachone: Later, we will call this cluster sampling.

Divide population into non-overlapping groups & choose an elementin each group: closely related to stratification.

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 23 / 31

Page 24: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Systematic sampling

Estimation

Partition the population into a groups

U = U1 ∪ U2 ∪ · · · ∪ Ua

where Ui : disjoint

Population total

Y =∑i∈U

yi =a∑

r=1

∑k∈Ur

yk =a∑

r=1

tr

where tr =∑

k∈Uryk .

Think of finite population with a elements with measurementst1, · · · , ta.

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 24 / 31

Page 25: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Systematic sampling

Estimation (Cont’d)

HT estimator:

YHT =tr

1/a,

if A = Ur .

Variance: Note that we are doing SRS from the population of aelements {t1, · · · , ta}.

Var(YHT

)=

a2

1

(1− 1

a

)S2t

where

S2t =

1

a− 1

a∑r=1

(tr − t)2

and t =∑a

r=1 tr/a.

When the variance is small ?

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 25 / 31

Page 26: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Systematic sampling

Estimation (Cont’d)

Now, assuming N = na

V(YHT

)= a (a− 1)S2

t

= n2aa∑

r=1

(yr − yu)2

where yr = tr/n and yu = t/n.

ANOVA: U = ∪ar=1Ur

SST =∑k∈U

(yk − yu)2 =a∑

r=1

∑k∈Ur

(yk − yu)2

=a∑

r=1

∑k∈Ur

(yk − yr )2 + na∑

r=1

(yr − yu)2

= SSW + SSB.

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 26 / 31

Page 27: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Systematic sampling

V(YHT

)= na · SSB = N · SSB = N (SST − SSW ) .

If SSB is small, then yr are more alike and V(YHT

)is small.

If SSW is small, then V(YHT

)is large.

Intraclass correlation coefficient ρ measures homogeniety of clusters.

ρ = 1− n

n − 1

SSW

SST

More details about ρ will be covered in the cluster sampling.(Chapter 6).

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 27 / 31

Page 28: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Systematic sampling

Comparison between systematic sampling (SY) and SRS

How does SY compare to SRS when the population is sorted by thefollowing way ?

1 Random ordering: Intuitively should be the same2 Linear ordering: SY should be better than SRS3 Periodic ordering: if period = a, SY can be terrible.4 Autocorrelated order: Successive yk ’s tend to lie on the same side of

yu. Thus, SY should be better than SRS.

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 28 / 31

Page 29: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Systematic sampling

How to quantify ? :

VSRS

(YHT

)=

N2

n

(1− n

N

) 1

N − 1

N∑k=1

(yk − YN

)2VSY

(YHT

)= n2a

a∑r=1

(yr − yu)2

Cochran (1946) introduced superpopulation model to deal with thisproblem. (treat yk as a random variable)

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 29 / 31

Page 30: Chapter 3: Element sampling design: Part 1jkim.public.iastate.edu/teaching/chapter3.pdfSimple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling

Systematic sampling

Example: Superpopulation model for a population in random order.Denote the model by ζ: {yk} iid

(µ, σ2

)Eζ

{VSRS

(YHT

)}=

N2

n

(1− n

N

)σ2

{VSY

(YHT

)}=

N2

n

(1− n

N

)σ2

Thus, the model expectations of the design variances are the sameunder the IID model.

Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 30 / 31