Upload
truongthu
View
239
Download
6
Embed Size (px)
Citation preview
Chapter 3: Element sampling design: Part 1
Jae-Kwang Kim
Fall, 2014
Simple random sampling
1 Simple random sampling
2 SRS with replacement
3 Systematic sampling
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 2 / 31
Simple random sampling
Simple Random Sampling
Motivation: Choose n units from N units without replacement.1 Each subset of n distinct units is equally likely to be selected.2 There are
(Nn
)samples of size n from N.
3 Give equal probability of selection to each subset with n units.
Definition
Sampling design for SRS:
P(A) =
1/(N
n
)if |A| = n
0 otherwise.
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 3 / 31
Simple random sampling
Lemma
Under SRS, the inclusion probabilities are
πi = n/N
πij =n (n − 1)
N (N − 1)for i 6= j .
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 4 / 31
Simple random sampling
Theorem
Under SRS design, the HT estimator
YHT =N
n
∑i∈A
yi = Ny
is unbiased for Y and has variance of the form
V(YHT
)=
N2
n
(1− n
N
)S2
where
S2 =1
2
1
N
1
N − 1
N∑i=1
N∑j=1
(yi − yj)2 =
1
N − 1
N∑i=1
(yi − Y
)2.
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 5 / 31
Simple random sampling
Theorem (Cont’d)
Also, the SYG variance estimator is
V(YHT
)=
N2
n
(1− n
N
)s2
where
s2 =1
n − 1
∑i∈A
(yi − y)2 .
Thus, under SRSE (s2) = S2.
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 6 / 31
Simple random sampling
Remark (under SRS)
1− n/N is often called the finite population correction (FPC) term.The FPC term can be ignored (FPC
.= 1) if the sampling rate n/N is
small (≤ 0.05) or for conservative inference.
For n = 1, the variance of the sample mean is
1
n
(1− n
N
)S2 =
1
N
N∑i=1
(yi − Y
)2 ≡ σ2YCentral limit theorem: under some conditions,
V−1/2(YHT − Y
)=
y − Y√1n
(1− n
N
)S2→ N (0, 1) .
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 7 / 31
Simple random sampling
Remark (under SRS)
Sample size determination1 Choose the target variance V ∗ of V (y).2 Choose n the smallest integer satisfying
1
n
(1− n
N
)S2 ≤ V ∗.
For dichotomous y (taking 0 or 1), may use S2 .= P(1− P) ≤ 1/4. A
simple rule is n ≥ d−2, where d is the margin of error.
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 8 / 31
Simple random sampling
How to select a simple random sample of size n from thefinite population ?
Draw-by-draw procedure
Rejective Bernoulli sampling method
Sample Reservoir method
Random sorting method
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 9 / 31
Simple random sampling
Draw-by-draw procedure
For example, consider U = {1, 2, · · · ,N} and n = 2.
In the first draw, select one element with equal probability.
In the second draw, select one element with equal probability fromU − {a1} where a1 is the element selected from the first draw. Let a2be the element selected from the second draw.
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 10 / 31
Simple random sampling
Draw-by-draw procedure (Cont’d)
P(a1, a2) = P(a1)P(a2 | U − {a1}) + P(a2)P(a1 | U − {a2})
=
=2
N(N − 1).
We can prove similar results for general n. (Use mathematical induction).
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 11 / 31
Simple random sampling
Rejective Bernoulli sampling method
1 Apply Bernoulli sampling of expected size n.
I1, · · · , IN ∼ Bernoulli(f )
where f = n/N.
2 Check if the realized sample size is n. If yes, accept the sample.Otherwise, goto Step 1.
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 12 / 31
Simple random sampling
Rejective Bernoulli sampling method (Cont’d)
Justification:
P
(I1, I2, · · · , IN |
N∑i=1
Ii = n
)=
∏Ni=1 f
Ii (1− f )1−Ii(Nn
)f n(1− f )N−n
=1(Nn
)if∑N
i=1 Ii = n.
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 13 / 31
Simple random sampling
Reservoir method (McLeod and Bellhouse, 1983)
1 The first n units are selected into the sample.2 For each k = n + 1, · · · ,N:
1 Select k with probability n/k .2 If unit k is selected, remove one element from the current sample with
equal probability.3 Unit k takes the place of the removed unit.
Note that the population size is not necessarily known. You can stop anytime point of the process then you will obtain a simple random samplefrom the finite population considered up to that time point.
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 14 / 31
Simple random sampling
Random sorting method
1 A value of an independent uniform variable in [0,1] is allocated toeach unit of the population.
2 The population is sorted in ascending (or descending) order.
3 The first n units of the sorted population are selected in the sample.
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 15 / 31
SRS with replacement
1 Simple random sampling
2 SRS with replacement
3 Systematic sampling
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 16 / 31
SRS with replacement
In with-replacement sampling, order of the sample selection is important.
Ordered sampleOS = (a1, a2, · · · , an)
where ai is the index of the element in the i-th with-replacementsampling.
Sample: A = {k ; k = ai for some i , i = 1, 2, · · · ,m}SRS with replacement: For each i-th draw, we use
ai = k with probability 1/N, k = 1, · · · ,N.
Sample size is random variable: Note that
πk = Pr (k ∈ A)
= 1− Pr (k /∈ A)
= 1−(
1− 1
N
)n
Thus, n0 =∑N
k=1 πk = N − N(1− N−1
)n ≤ n for n > 2.
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 17 / 31
SRS with replacement
1 First, define
Zi = yai =N∑
k=1
yk I (ai = k) .
Note that Z1, · · · ,Zn are independent random variables since the ndraws are independent.
2 Z1, · · · ,Zm are identically distributed since the same probabilities areused at each draw, where E (Zi ) = Y and
V (Zi ) = N−1N∑
k=1
(yk − Y
)2 ≡ σ2y .3 Thus, Z1, · · · ,Zm are IID with mean Y and variance σ2y . Use
z =∑n
k=1 Zk/n to estimate Y .
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 18 / 31
SRS with replacement
Estimation of Total
Unbiased estimator of Y :
YSRSWR =N
n
n∑i=1
yai = Nyn.
Variance
V(YSRSWR
)=
N2
n
(1− 1
N
)S2 =
N2
nσ2y ≥ V (YSRS)
where S2 = (N − 1)−1∑N
i=1(yi − YN)2 = N(N − 1)−1σ2y .
Variance estimation
V(YSRSWR
)=
N2
ns2
where s2 = (n − 1)−1∑n
i=1(yai − yn)2. Note that E (s2) = σ2y .
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 19 / 31
Systematic sampling
1 Simple random sampling
2 SRS with replacement
3 Systematic sampling
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 20 / 31
Systematic sampling
Setup:1 Have N elements in a list.2 Choose a positive integer, a, called sampling interval. Let n = [N/a].
That is, N = na + c , where c is an integer 0 ≤ c < a.3 Select a random start, r , from {1, 2, · · · , a} with equal probability.4 The final sample is
A = {r , r + a, r + 2a, · · · , r + (n − 1)a} , if c < r ≤ a
= {r , r + a, r + 2a, · · · , r + na} , if 1 ≤ r ≤ c .
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 21 / 31
Systematic sampling
Sample size can be random
nA =
{n if c < r ≤ an + 1 if r ≤ c
Inclusion probabilities
πk =
πkl =
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 22 / 31
Systematic sampling
Remark
This is very easy to do.
This is a probability sampling design.
This is not measurable sampling design: No design-unbiasedestimator of variance (because only one random draw)
Pick one set of elements (which always go together) & measure eachone: Later, we will call this cluster sampling.
Divide population into non-overlapping groups & choose an elementin each group: closely related to stratification.
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 23 / 31
Systematic sampling
Estimation
Partition the population into a groups
U = U1 ∪ U2 ∪ · · · ∪ Ua
where Ui : disjoint
Population total
Y =∑i∈U
yi =a∑
r=1
∑k∈Ur
yk =a∑
r=1
tr
where tr =∑
k∈Uryk .
Think of finite population with a elements with measurementst1, · · · , ta.
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 24 / 31
Systematic sampling
Estimation (Cont’d)
HT estimator:
YHT =tr
1/a,
if A = Ur .
Variance: Note that we are doing SRS from the population of aelements {t1, · · · , ta}.
Var(YHT
)=
a2
1
(1− 1
a
)S2t
where
S2t =
1
a− 1
a∑r=1
(tr − t)2
and t =∑a
r=1 tr/a.
When the variance is small ?
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 25 / 31
Systematic sampling
Estimation (Cont’d)
Now, assuming N = na
V(YHT
)= a (a− 1)S2
t
= n2aa∑
r=1
(yr − yu)2
where yr = tr/n and yu = t/n.
ANOVA: U = ∪ar=1Ur
SST =∑k∈U
(yk − yu)2 =a∑
r=1
∑k∈Ur
(yk − yu)2
=a∑
r=1
∑k∈Ur
(yk − yr )2 + na∑
r=1
(yr − yu)2
= SSW + SSB.
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 26 / 31
Systematic sampling
V(YHT
)= na · SSB = N · SSB = N (SST − SSW ) .
If SSB is small, then yr are more alike and V(YHT
)is small.
If SSW is small, then V(YHT
)is large.
Intraclass correlation coefficient ρ measures homogeniety of clusters.
ρ = 1− n
n − 1
SSW
SST
More details about ρ will be covered in the cluster sampling.(Chapter 6).
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 27 / 31
Systematic sampling
Comparison between systematic sampling (SY) and SRS
How does SY compare to SRS when the population is sorted by thefollowing way ?
1 Random ordering: Intuitively should be the same2 Linear ordering: SY should be better than SRS3 Periodic ordering: if period = a, SY can be terrible.4 Autocorrelated order: Successive yk ’s tend to lie on the same side of
yu. Thus, SY should be better than SRS.
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 28 / 31
Systematic sampling
How to quantify ? :
VSRS
(YHT
)=
N2
n
(1− n
N
) 1
N − 1
N∑k=1
(yk − YN
)2VSY
(YHT
)= n2a
a∑r=1
(yr − yu)2
Cochran (1946) introduced superpopulation model to deal with thisproblem. (treat yk as a random variable)
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 29 / 31
Systematic sampling
Example: Superpopulation model for a population in random order.Denote the model by ζ: {yk} iid
(µ, σ2
)Eζ
{VSRS
(YHT
)}=
N2
n
(1− n
N
)σ2
Eζ
{VSY
(YHT
)}=
N2
n
(1− n
N
)σ2
Thus, the model expectations of the design variances are the sameunder the IID model.
Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 30 / 31