59
Overview of the state-of-the-art of survey sampling Imbi Traat Institute of Mathematical Statistics University of Tartu August 23-27, 2009, Kyiv 4 lectures 1

Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Overview of the state-of-the-art of surveysampling

Imbi TraatInstitute of Mathematical Statistics

University of Tartu

August 23-27, 2009, Kyiv

4 lectures

1

Page 2: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

The four lectures

• Population, domains, parameters, sample, sampling design,sampling procedure;

• Unbiased estimation and variance estimation, the effect ofthe second order inclusion probabilities;

• More on estimation;

• Calibration; for increasing precision, for compensating non-response, for achieving consistency between estimates;

2

Page 3: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

• Sample survey − a sample based study of the finite popu-lation,

• the aim − reliable, timely (regular), cost-efficient estimatesfor the population and for its domains,

• today, a worldwide survey industry,

• government, academic, private, mass-media, etc. sectors,

• central statistical offices in many countries

• have to provide information by law,

• proper specialized education at universities needed.

3

Page 4: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Some steps in History

• 1891 Census in Norway used partial investigation of the pop-ulation,

• Kiaer (1897) describes the representative method (SI-sampling,purposive sampling),

• Neyman (1934) gives a new sense to "representative" (con-fidence intervals),

• stratified with different sampling fractions is not representa-tive in old sense but can be more efficient,

• study of different sampling methods exploded to develop.

4

Page 5: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Some steps in History

• 1991 Estonia became independent,

• 1993 First sampling course at Tartu University,

• 1995 Household Budget Survey, Labour Force Survey, En-terprize survey,

• many difficulties.

• very important has been the Baltic-Nordic Network in SurveySampling Theory and Methodology.

5

Page 6: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Population and other notions

Population (finite, size N , elements/units identified):

U = {1,2, . . . , N}.

Frame a list of population units with some characteristics (nowa-days in computer), e.g. Register.

Units (people, schools, enterprizes, farms, ...):

k, k ∈ U.

Sample (a set, part of U):

s, samle size n

.6

Page 7: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Variable, its value on unit k:

yk − study variable (income, education, labour force status, ...)

xk − auxiliary variable (age, sex, address, ...).

Taken from Register or measured in a survey.

Population parameters:

Y =∑U

yk − total

Y =1

N

∑U

yk − mean

R =Y

Z− ratio of two totals, Z =

∑U

zk.

Special interpretation for binary yk, i.e. yk ∈ {0,1}.

7

Page 8: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Domain − part of the population:

Usually formed by a categorical variable, or by cross-classifyingmany categorical variables (county, sex-age class)

Ud, size Nd, d = 1,2, . . . , D.

Small domains − small area estimation

Domain parameters

Ud =∑Ud

yk,

Yd =1

Nd

∑Ud

yk,

Rd =Yd

Zd, Zd =

∑Ud

zk.

8

Page 9: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Probability sample − s random, there is a probability p(s).

Sampling design − probability distribution p(s) on all possiblesamples s

Task. 1. Let N = 4, i.e. U = {1,2,3,4} and let n = 2. Fill in!

s {1,2} {1,3}p(s) 0.3

Probabilities p(s) are defined by sampling procedure, they areoften unknown.

Sampling procedure − activities to be made for drawing a sam-ple

For a given sampling design p(s) there are many sampling pro-cedures.

10

Page 10: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Design p(s) defines random selection of a population unit k:

Ik =

{1, unit k is selected,0, not selected. sampling indicator, random variable ,

πk = P (Ik = 1), inclusion probability,

πkl = P (Ik = 1, Il = 1), second order inclusion probability.

πk needed for (nearly) unbiased estimation, πkl needed in vari-ance formulae.

Task 2. Find π1, π2, π3, π4 for the design of Task 1. We use theformula

πk =∑

s, k∈s

p(s).

The πk are known or approximately known in practice, the πkl

are often unknown.

11

Page 11: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Some classical sampling procedures

• systematic sampling;

• probability-proportional-to-size-sampling;

• cluster sampling;

• multi-stage sampling;

• multi-phase sampling;

• Stratified sampling, different designs in strata.

Sampling design can be WOR and WR; fixed and variable samplesize; equal and unequal probability design.

12

Page 12: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Simple random sampling without replacement − SI

DEF: All samples of size n are equally probable: p(s) = 1(N

n ).

SI-design is,

• independent of all variables (distributions in sample ≈ distri-butions in population);

• aspires to proportional allocation of the sample to all groupsof the population;

• is theoretically best studied design;

• is used as a component of the complex designs in practice;

• its properties are used as approximations to other designs.

13

Page 13: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Some sampling procedures for drawing a SI-sample

• enumerate all samples and draw randomly one (like ball fromthe urn);

• draw-wise selection of units (if selected, then deleted in U);

• list-wise, a Bernoulli experiment is performed for each unitin U with probability

Pr(Ik+1 = 1) =n−

∑kj=1 ij

N − k;

• rejective way, Bernoulli sampling with probabilities πk = n/N

is performed in U . Sample is rejected if sample size is not n:

• order sampling (is good for sharing response burden );

14

Page 14: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Characteristics of the SI-design

πk =Cn−1

N−1

CnN

=n

N, ∀k

πkl =n

N

n− 1

N − 1, ∀k, l.

If some other design has πk and πkl same as SI, it still need notto be a SI-design.

15

Page 15: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

An alternative approach to samples and sampling design

Example. Let N = 4, n = 2. Let us have the frame of U in acomputer file. We create a column, where sampled elements aredenoted by 1.

U 1st sample 2nd sample1 1 12 1 03 0 14 0 0

Sample is a vector i = (i1, i2, . . . , iN); sampling design is proba-bilities on these vectors

Sampling design is a multivariate discrete distribution of the sam-pling vector I,

I ∼ p(i) = Pr(I = i).

16

Page 16: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Advantages

• some sampling designs have explicit probability functions p(i);

• tools from distribution theory become applicable;

• drawing a sample is simulation from distribution;

• unified consideration for WOR and WR designs.

17

Page 17: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Probability functions of some WOR sampling designs

Conditional Poisson p(i) = C ·∏k

pikk (1− pk)

1−ik, |i| = n.

Sampford p(i) = C ·∏k

pikk (1− pk)

1−ik ×∑

(1− pk)ik, |i| = n.

Pareto p(i) =∏k

pikk (1− pk)

1−ik ×∑

ckik, |i| = n,

where ck =∫ ∞

0xn−1∏ 1 + τj

1 + τjx·

1

1 + τkxdx, with τj = pj/(1−pj).

0 < pk < 1,∑

pk = n − parameters.

πk = pk for Sampford, otherwise πk ≈ pk (a lot of research!)

If pk ≡ n/N then SI in each case.18

Page 18: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Research problems

• relation of these designs to each other,

• good approximations to πkl,

• simulation with advanced methods (MCMC) for getting asample rapidly.

The probability functions of some WR designs.

Multinomial design Mult(n; p1, p2, . . . , pN),

p(i) = n!∏k

pikk

ik!, |i| = n,

∑pk = 1, ik ∈ {0,1, . . . , n}.

If pk ≡ 1/N then SIR design.19

Page 19: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Estimation Let θ be an estimator of θ.

Different approaches describing randomness of θ

• a design-based approach − θ = θ(s), thus θ is a discrete r.v.having values with probabilities p(s), all of its characteristicsdefined by p(s);

• model-based approach yk = f(xk) + εk, k ∈ s,assumptions on r.v. εk define properties of estimators θ;

• model-assisted approach, the relations

yk = f(xk) + εk, k ∈ s, with assumptions onεk,

are used for construction θ, its properties are defined by p(s).

Model-based approach is useful in small area estimation;Statistical agencies like to use design-based and model-assistedapproaches.

20

Page 20: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

General unbiased estimator for the total Y =∑

U yk (design-based approach):

Horvitz-Thompson (HT) estimator

Y =∑s

yk

πk.

Alternatively,

Y =∑s

akyk,

ak = 1/πk, design weight.

Interpretation! Weight column. Weight modification. Self-weighting design.

Note, if yk ≡ 1, then N =∑

U yk − population size, and N =∑

s ak

− its unbiased estimator.

21

Page 21: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Example: An average household size H = MN .

N number of households, M number of people, mk size of hh k.

A sample s of n hhs drawn through people from the PopulationRegister. Big hhs over-represented:

πk = nmkM inclusion probability of hh k (approximate);

Sample mean H =∑

s mk/n, biased;

HT-estimator H = 1N

∑s

mknmk/M

= MN exact;

Usually N unknown, use N =∑

s 1/πk = Mn

∑s

1mk

Now H = MN

= MN

= n∑s 1/mk

approximately unbiased.

22

Page 22: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

HT-estimator is unbiased under any design.

Since E(Ik) = Pr(Ik = 1) = πk, then

E(Y ) = E(∑s

yk

πk) = E(

∑U

Ikyk

πk) =

∑U

E(Ik)yk

πk=∑U

yk = Y.

Meaning of unbiasedness and variability in the design-based sense.

Task. 3. Let N = 4 and y = (9,6,4,1). Find Y and from eachsample Y under the design of Task 1. Do we see variability? Isthe estimator unbiased?

23

Page 23: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Variance of the HT-estimator:

V (θ) = E(θ − E(θ))2.

It measures from-sample-to-sample variability around E(θ).

For HT-estimator, denoting yk = ykπk

, we have

V (Y ) =∑k∈U

∑l∈U

(πkl − πkπl)ykyl.

An unbiased estimator for the variance is,

V (Y ) =∑k∈s

∑l∈s

(1−πkπl

πkl)ykyl.

Sen-Yates-Grundy variance estimator for fixed size designs,

V (Y ) = −1

2

∑k∈s

∑l∈s

(1−πkπl

πkl)(yk − yl)

2.

Other precision measures:√V (Y ) , standard error,√V (Y )

Y, relative error. How big can be tolerated?.

24

Page 24: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Unbiased estimation for domains

The same formulae can be applied with a simple trick.

Define a new variable

ydk = yk, if k ∈ Ud, otherwise ydk = 0.

Now domain total is a population total of the new variable,

Yd =∑Ud

yk =∑U

ydk.

We know how to estimate the population total∑

U ydk and whatis its variance.

25

Page 25: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Estimation under SI

The HT-estimator, its variance and variance estimator:

Y =∑s

yk

πk=∑s

N

nyk = Ny,

V (Y ) = N2(1− f)S2/n,

V (Y ) = N2(1− f)s2/n,

where

y =1

n

∑s

yk sample mean,

f =n

Nsampling fraction,

S2 =1

N − 1

∑U

(yk − Y )2 population variance of the variable,

s2 =1

n− 1

∑s(yk − y)2 sample variance of the variable.

Unbiased estimator of the population mean Y = Y/N is

y, where V (y) = (1−f)s2/n, note difference from classical result!

26

Page 26: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Task. 5.

Let us have y-variable like in Task 3. Find an estimate to Y fromeach sample assuming a SI-design with sample size 2. Comparevariability of this estimator and the estimator of Task 3.

Which design is more efficient in estimating Y ?

27

Page 27: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

A recent alternative form for variance of HT-estimator

Note that SI variance can be written as

VSI(Y ) = (1−n− 1

N − 1)VSIR(Y ),

where SIR means simple random sampling with replacement.

SIR corresponds to iid sampling,

SI is more efficient than SIR,

how much more, depends here on n,

Knottnerus (2003) has derived a similar expression for generalWOR and WR designs

28

Page 28: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

The Knottnerus variance of HT-estimatorLet a fixed size n WOR design have inclusion probabilities πk andπkl

Let the WR design be multinomial Mult(n; p1, p2, . . . , pN) suchthat

sampling expectations of units under these designs are equal,

πk = npk.

Then,

VWOR(Y ) = (1 + (n− 1)ρ)VMult(Y ),

where

VMult(Y ) =∑U

πk(yk − Y/n)2.

1 + (n− 1)ρ is generalized finite population correction term,

29

Page 29: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

shows how many times the WOR design is more efficient thanthe WR design.

The effect of the 2nd order inclusion probabilities for variancecomes through ρ.

ρ sampling autocorrelation

Page 30: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

The sampling autocorrelation

ρ =

∑∑i6=j πij(yi − Y/n)(yj − Y/n)

(n− 1)∑

πi(yi − Y/n)2.

• why autocorrelation?

• note, depends on both the πkl and study variable,

• is known for some simple designs, −1/(N − 1) for SI,

• the limits −(n− 1)−1 ≤ ρ ≤ 1,

• difficult to estimate, otherwise new good variance estimatorscould be received.

30

Page 31: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Gabler’s condition

Look at

D =VWOR(Y )

VMult(Y )= 1 + (n− 1)ρ.

D ≤ 1 if ρ ≤ 0, the latter depends on y-variable.

When is D ≤ 1 uniformly (i.e.∀y)?

Gabler(1984)

If∑

i∈U minjπijπj

≥ n− 1, then D ≤ 1 uniformly.

How much less than 1? Is there a sharper bound possible?

31

Page 32: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

A sharper bound for D, our study

We assume π1 ≤ π2 ≤ . . . πN . Consider a matrix B = (bij),

bij =

1, if i = j,πijπj

, otherwise.

Let λ2 be the 2nd largest eigenvalue of B. It holds

D ≤ λ2.

Under Gabler’s condition λ2 ≤ 1.

Can λ2 be found exactly?

We consider designs for which

bij ↑ with j, i 6= j.

32

Page 33: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

These are the Conditional Poisson, the Hajek’s approximativeand some other designs.

For such designs

1−2

π1 + π2π12 ≤ λ2 ≤ 1−

π12

π2.

If π1 = π2 then

λ2 = 1−π12

π1.

For SI-sampling,

D = λ2 = 1−n− 1

N − 1.

Page 34: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Stratified sampling − most often used in practice.

Population U is divided into groups (strata) with the help ofstratification variables. Sampling is performed separately in eachstratum.

For example, in the Enterprize Survey strata are formed by ac-tivity and number of employees.

Why stratificaion?

• administrative reasons:coordination, sharing responsibility, different situation requiresdifferent sampling methods;

• coverage of important domains by sampled elements;

• cost reduction;33

Page 35: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

• variance reduction;

• assuming non-response;

Page 36: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Estimation under stratified sampling.

Let Uh, h = 1,2, . . . H be strata in U .

Let the total in Uh be: Yh =∑

Uhyk.

Population total is then Y =∑H

h=1 Yh.

The Yh can be estimated by HT-estimator:

Yh =∑sh

yk

πk.

Population total is estimated by Y =∑H

h=1 Yh.

Respective variance estimator is V (Y ) =∑H

h=1 V (Yh).

34

Page 37: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Forming strata for variance reduction.

We see from V (Y ) =∑H

h=1 V (Yh) that variance is smaller ifvariances in strata are smaller.

Therefore it is necessary to form strata as homogeneous as pos-sible.

We have a problem if there are many study variables? Which arethe most important?

35

Page 38: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Stratified sampling with SI in strata

Used very often.

Let stratum Uh have a size Nh and sample size nh.

Then under SI in stratum

πk =nh

NH, in stratum Uh

Yh =Nh

nh

∑sh

yk = Nhyh

Y =H∑

h=1

Yh.

Design weights ak = Nh/nh are constant inside stratum, maydiffer between strata.

Task 6. Let us have a stratified SI sampling. Put down varianceestimator for Y . Explain the notation.

36

Page 39: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Allocation of sample in strata.

Let the total sample size n be fixed. How to determine nh?

Let the cost function be

C = c0 +H∑

h=1

nhch,

where c0 - general expenses, ch - expenses for getting data fromunits in Uh.

Cost can be reduced if not to sample units from Uh with largech. But which effect does this have to variance?

There is a compromise. The optimal set of nh is such thatminimizes a product C ·D(Y ).

37

Page 40: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Optimal allocation of sample

is achieved under stratified SI with

nh ∝Nh · Syh√

ch,

where proportionality constant is determined from the conditionn =

∑Hh=1 nh, and

Syh =

√√√√ 1

N − 1

∑Uh

(yk − y)2.

We see: more sample from larger strata, from strata with largery-variability, less sample from high-cost strata.

For constant cost in strata (telephone, post, internet) we get

nh ∝ Nh · Syh, where proportionality constant isn∑H

h=1 Nh · Syh.

This is Neyman allocation (often used in enterprize surveys).38

Page 41: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Task 7. In the table we have a population of hhs. There are 3strata, a known auxiliary variable is household size. Find in eachstratum mean and standard deviation of hh size. Let the totalsample size be 8. Perform a Neyman allocation of the samplein strata, using hh size variable. NB! Minimal sample size instratum should be 2.

Stratum HH HH size1 1 4

2 33 4

2 1 42 63 44 75 8

3 1 22 33 24 25 26 3

39

Page 42: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Problems with allocation

• Optimal is optimal for 1 variable. There are many variablesin a survey.

• Optimal is good for such variables whose st.-deviation be-haves like Syh. For the rest of variables it may increase thevariance.

• Syh is not known before survey, but still nh has to be deter-mined.

• if the variables are very different (in Syh), it is better to makea proportional allocation

nh = nNh

N.

40

Page 43: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Proportional allocation is optimal if Syh is constant in eachstrata. It has been shown:

Vopt(Y ) ≤ Vprop(Y ) ≤ VSI(Y ).

Page 44: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Calibration

A method for constructing estimators that use auxiliary informa-tion in certain way.

Theoretical elaborations ca. 20 years ago (Deville and Särndal1992 ).

Now the method is used in Statistical agencies worldwide.

The method allows to calculate new weights that can be appliedto any study variable (uniform weighting system).

Any domain total is estimated by using the same weight systembut summing over sample in that domain.

41

Page 45: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Calibration is used

• for increasing precision of estimators,

• for compensating non-response,

• for achieving consistency between estimates from differentsurveys.

• consistency in the same survey, if different estimators areused, and e.g. the additivity is violated.

42

Page 46: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Let the totals of auxiliary variables be known

X =∑U

xk, (a vector)

.

Calibrated weights wk are such for which∑s

wkxk = X. (1)

It is also required that wk are close to design weights ak = 1/πk.

Two basic methods:

• a distance minimization method ;

• an instrument vector method.

43

Page 47: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

GREG-estimator (Särndal, Swennson, Wretman 1992) follows asa special case from both approaches.

The philosophy of calibration and GREG-estimation is describedin Särndal (2007).

44

Page 48: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Calibration of weights by instrument vector method

The weights are searched in the form

wk = ak(1 + λ′zk), (2)

where the instrument vector zk has the same dimension as xk.

λ is found from the constraints formula:

X′ =∑s

wkx′k =

∑s

ak(1 + λ′zk)x′k.

After simple manipulation,

λ′ = (X− X)′(∑s

akzkx′k)−1,

where X is a vector of HT-estimates for the total X.

Weights (2) satisfy constraints (1). They are used in estimatingstudy variable totals:

YCAL =∑s

wkyk.

45

Page 49: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Task 8. Let the population of hhs be like in Task 7. Let thesampling design in the population be SI with size n = 8. Takean arbitrary sample of 8 hhs. Find calibrated weights for theunits of your sample. We look a simple case with 1-dimensionalauxiliary variable and instrument variable. Let the auxiliary xk behh size. Fix an instrument variable (e.g. zk = 1/xk) and find theweights. Comment!

46

Page 50: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Domain estimates are calculated with the same weights butsumming over the domain sample:

YdCAL =∑sd

wkyk.

The instrument vector has to be fixed. Usually

zk = qkxk,

where numbers qk > 0. This choice gives GREG-estimator.

With choices of zk one gets many special cases known earlier.

47

Page 51: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Post-stratification estimator

Post-strata are such subgroups in U , which are not used forsampling. They are used in the estimation stage. Sample isdivided into post-strata and the estimator is formed as describedbelow.

Let zk = qkxk = δk, where

δk = (δ1k, δ2k, . . . , δDk)

is an indicator-vector of the post-stratum. Now we have in cali-bration estimator

X =∑U

δk = (N1, N2, . . . , ND), sizes of post-strata;

X =∑s

akδk = (N1, N2, . . . , ND), estimates of post-strata sizes;

wk = akNd

Nd, if k ∈ sd, calibrated weight. Prove it!

48

Page 52: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Calibrated estimator for post-stratum total,

YdCAL = NdYd

Nd,

and for population total (post-stratified estimator),

YCAL =D∑

d=1

NdYd

Nd, (3)

where Yd =∑

sdakyk and Nd =

∑sd

ak.

The estimator (3) is a simplest possibility to compensate non-response:

• post-strata should be formed so that respondents and non-respondents are similar in them (Nd should be known);

• Yd and Nd are calculated using respondents in post-stratumd; Yd/Nd estimates the mean in post-stratum;

• mean of respondents is transferred to all units.

49

Page 53: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Task 9. Which form takes the formula (3), if one has SI sam-pling in the entire population with sample size n.

50

Page 54: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Calibration for consistency between surveys – AC calibra-tion

We have 2 surveys the PRS and the RFS.

yk is common variable in both surveys. Its total Y0 is estimatedin the RFS.

We want that the domain totals of y-variable in PRS are consis-tent with Y0.

For units k in the PRS sample s we observe the vectors xk, yk.We know the (p + m)-dimensional vectors(

xkyk

), k ∈ s,

(XY0

),

(XY

),

where X =∑

U xk and Y0 are known, and X =∑

s akxk andY =

∑s akyk are HT estimators in the PRS.

51

Page 55: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

AC calibration

Let z∗k be an instrument vector with matching dimension. ThenAC calibrated weights are given by

w∗k = ak(1 + λ∗′z∗k), where

λ∗′ =

(X− XY0 − Y

)′M−1, M =

∑s

akz∗k

(xkyk

)′,

The w∗k satisfy the A constraint

∑s w∗

kxk = X and the C con-straint, ∑

sw∗

kyk = Y0. (1)

The w∗k can be used for estimating all totals of interest in PRS.

In particular, the vector of common variable domain totals, Yd,is estimated as

Y∗dCAL =

∑sd

w∗kyk, d ∈ D.

Page 56: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

We have additive consistency with the RFS estimator Y0, since(1) implies

∑d∈D Y∗

dCAL = Y0.

The weights w∗k can be routinely computed, like the ordinary

calibration weights.

However, we want to know how the A calibration and C calibra-tion are related to each other.

Page 57: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Derivation Let z∗k′ = qk(x

′k,y′k) and zk = qkxk. Then for weights

w∗k = ak(1 + λ∗′z∗k), where

λ∗′ =

(X− XY0 − Y

)′M−1, M =

∑s

akz∗k

(xkyk

)′,

we need to invert a matrix

M =

(Txx Txy

T′xy Tyy

),

Txx =∑s

akqkxkx′k : p× p

Txy =∑s

akqkxky′k : p×m

Tyy =∑s

akqkyky′k : m×m.

A block matrix, formula readily available.

Simplification to a meaningful form is a problem.

52

Page 58: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Success, when using residuals from regressing yk on xk:

ek = yk − B′xk, where B = T−1xx Txy.

Resulting AC calibrated weights,

w∗k = wk + akqke

′kQ

−1(Y0 − YCAl),

where wk are the A calibrated weights, and

Q =∑s

akqkeke′k : m×m.

The dimensionality of matrix inversion is reduced.

The Q is positive definite and therefore invertible.

The weights w∗k can be applied to any study variable in any

domain. If applied to common variables, the consistency withY0 is achieved.

53

Page 59: Overview of the state-of-the-art of survey samplingprobability.univ.kiev.ua/school09/papers/Traat_Kyiv.pdf · Overview of the state-of-the-art of survey sampling Imbi Traat Institute

Thank you!

54