31
Motivation Microsimulation via Quantiles Empirical studies A Unit-level Quantile Nested Error Regression Model for Domain Prediction Timo Schmid, Nikos Tzavidis, Nicola Salvati and Beate Weidenhammer Workshop Pisa July 2016 Schmid, Tzavidis, Salvati and Weidenhammer 1 (31) Microsimulation via Quantiles for Domain Prediction

A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

A Unit-level Quantile Nested ErrorRegression Model for Domain Prediction

Timo Schmid, Nikos Tzavidis, Nicola Salvati and BeateWeidenhammer

Workshop

PisaJuly 2016

Schmid, Tzavidis, Salvati and Weidenhammer 1 (31) Microsimulation via Quantiles for Domain Prediction

Page 2: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Overview

Motivation

Microsimulation via Quantiles

Empirical studiesModel-based simulations

Schmid, Tzavidis, Salvati and Weidenhammer 2 (31) Microsimulation via Quantiles for Domain Prediction

Page 3: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

What is the problem?

I Surveys are designed to produce reliable estimates at theNational level and also at pre-specified sub-National levels

I What happens if after collecting the data we are interested inproducing estimates for groups/areas consisting of veryfew observations?

I Using only the sample data will result in estimates that havehigh sampling variance.

I Small area estimation is concerned with the development ofstatistical procedures for producing efficient estimates fordomains (planned or unplanned) with small or zero samplesizes.

Schmid, Tzavidis, Salvati and Weidenhammer 3 (31) Microsimulation via Quantiles for Domain Prediction

Page 4: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Illustation using Mexican data - Gini coefficient

Direct Estimator Small Area Estimator

0.40

0.45

0.50

0.55

0.60

0.65

0.70

(white - municipalities with zero sample sizes)

Schmid, Tzavidis, Salvati and Weidenhammer 4 (31) Microsimulation via Quantiles for Domain Prediction

Page 5: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Domain Prediction of Non-linear Indicators

I Dominated recent SAE literature

I Motivated by growing needs of statistics agencies

I Examples

I Estimate the income distributionI Estimate poverty and inequality indicators

I SAE Data Requirements:I Survey Data: Available for y and for x related to yI Census/Administrative Data: Available for x but not for y

Auxiliary information available for every unit in the population isneeded.

Schmid, Tzavidis, Salvati and Weidenhammer 5 (31) Microsimulation via Quantiles for Domain Prediction

Page 6: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Non-linear Indicators

I FGT measures (Foster et al.,1984)

FGT (α, t) =N∑i=1

( t − yit

)α1(yi ≤ t)

α = 0 - Head Count Ratio; α = 1 - Poverty Gap

I Gini coefficient

I Quintile Share Ratio

QSR80/20 =

N∑i=1

[yi1(yi > q0.8)]

N∑i=1

[yi1(yi ≤ q0.2)]

Schmid, Tzavidis, Salvati and Weidenhammer 6 (31) Microsimulation via Quantiles for Domain Prediction

Page 7: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Recent Methodologies

I The World Bank method(Elbers et al., 2003, Econometrica)

I The Empirical Best Predictor (EBP) method(Molina & Rao, 2010, CJS)

I EBP based on normal mixtures(Lahiri and Gershunskaya, 2011; Elbers & Van der Weide, 2014)

I Methods based on M-Quantiles(Marchetti et al., 2012, CSDA)

Schmid, Tzavidis, Salvati and Weidenhammer 7 (31) Microsimulation via Quantiles for Domain Prediction

Page 8: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

The EBP Method

I Point of departure: Nested error regression model Battese, Harter &Fuller (1988, JASA)

Notation: (k =domain, i =individual)

yik = xTikβ + zT

ikuk + εik , i = 1, ..., nk , k = 1, ...,D

I Use sample data to estimate β, σu, σε, uk

I Generate u∗k ∼ N(0, σ2u ∗ (1− γk)) & ε∗ik ∼ N(0, σ2

ε )

Micro-simulating a synthetic population:

I Generate a synthetic population under the model a large number oftimes each time estimating the target parameter

I Linear and non-linear indicators can be computed

Schmid, Tzavidis, Salvati and Weidenhammer 8 (31) Microsimulation via Quantiles for Domain Prediction

Page 9: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Motivating Alternative Methods

I EBP relies on assumptions about the distribution of the data

I What if these fail?

I Option 1: Explore the use of transformations

I Deciding on appropriate transformations requires some work

I Option 2: EBP formulation under an alternative distribution

I Option 3: Estimate the quantile function of the targetempirical distribution

I Do this by using a nested error regression type model

Schmid, Tzavidis, Salvati and Weidenhammer 9 (31) Microsimulation via Quantiles for Domain Prediction

Page 10: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Overview

Motivation

Microsimulation via Quantiles

Empirical studiesModel-based simulations

Schmid, Tzavidis, Salvati and Weidenhammer 10 (31) Microsimulation via Quantiles for Domain Prediction

Page 11: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Alternative View

I Let x denote a set of covariates and k a domain

I Let Qy |x ,k(q|x , k) denote the quantile function of an unknownF (y |x , k)

I Interested in estimating this quantile function

I Simplest case: Assume a linear random effects model for thequantiles

Qy |x ,k(q|x , k) = xTikβq + vk

I vk domain random effect capturing unobservedheterogeneity

Schmid, Tzavidis, Salvati and Weidenhammer 11 (31) Microsimulation via Quantiles for Domain Prediction

Page 12: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Model for the Quantiles

I Qy |x,k(q|x , k) is estimated by using the link between quantileregression and the ALD(µq, σ, q) likelihood (Geraci & Bottai, 2014)

with µq = xTikβq + vk

I vk ∼ N(0, σ2v ); Other specifications are possible (Non-parametric)

I The approach for fitting is done separately at each q

I Estimation by ML, obtain βq, σq, σ2v ,q

I Plug-in predictor v is q-specific

I The predictor for the q-quantile of y given x in domain k is

Qy |x,k(q|x , k) = xTik βq + vqk

Schmid, Tzavidis, Salvati and Weidenhammer 12 (31) Microsimulation via Quantiles for Domain Prediction

Page 13: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Micro-simulation & Domain Prediction

I Interested in domain-specific finite population parameters

I Some target indicators heavily depend on the tails of thedistribution

I Capture the shape of the target distribution

I Define a fine grid of quantiles qg = {0.01, 0.02, . . . , 0.99}I Fit the quantile model for q ∈ qg

I Estimate Qy |x ,k(q|x , k)

I One additional step: Draw Monte-Carlo samples from byusing the estimated Quantile function

Schmid, Tzavidis, Salvati and Weidenhammer 13 (31) Microsimulation via Quantiles for Domain Prediction

Page 14: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Micro-simulation & Domain Prediction

I Use the inverse transform method

I Draw h = 1, . . . ,MC , Uniform random variables ∼ (0, 1)

I Invert the corresponding quantile function, obtain yik

yik = (y(1)ik , y

(2)ik , . . . , y

(MC)ik )

I Leads to microsimulation in domain k

yk = (y(1)1k , . . . , y

(MC)1k , y

(1)2k , . . . , y

(MC)2k , . . . , y

(1)Nkk

, . . . , y(MC)Nkk

)

Schmid, Tzavidis, Salvati and Weidenhammer 14 (31) Microsimulation via Quantiles for Domain Prediction

Page 15: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Micro-simulation & Domain PredictionI Estimate domain target parameter from

yk = (y(1)1k , . . . , y

(MC)1k , y

(1)2k , . . . , y

(MC)2k , . . . , y

(1)Nkk

, . . . , y(MC)Nkk

)

I Domain average

meank = mean(yk) =1

Nk ·MC

Nk∑i=1

MC∑mc=1

y(mc)ik

I Domain median

mediank = median(yk)

I Head Count Ratio

HCRk =1

Nk ·MC

Nk∑i=1

MC∑mc=1

I (y(mc)ik ≤ t)

Schmid, Tzavidis, Salvati and Weidenhammer 15 (31) Microsimulation via Quantiles for Domain Prediction

Page 16: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Constrained Fitting

I Currently the fitting process is done separately for each q

I Follows Lum & Gelfand (Bayesian Anal., 2012) ; Geraci &Bottai (Stats & Comp, 2014)

I Joint quantile specification is complex

I Alternative: Constrain the fitting process

I Allows for one common random effect across q

I Allows implicitly for correlation

I Can impose monotonicity at the same time

Schmid, Tzavidis, Salvati and Weidenhammer 16 (31) Microsimulation via Quantiles for Domain Prediction

Page 17: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

MvQ: MSE Estimation

Bootstrap:

I Select q from U ∼ (0, 1)I Generate v∗k

I Using the assumed distributionI Non-parametrically with rescaling to adjust for shrinkage

I Generate ε∗ikI Option 1: Re-sample from the empirical distribution of

residuals appropriately rescaledI Option 2: Investigate the use of wild bootstrap to

accommodate the non-id case (Feng et al., Biometrika, 2011)

y∗ik = xTik βq + v∗k + ε∗qik

Schmid, Tzavidis, Salvati and Weidenhammer 17 (31) Microsimulation via Quantiles for Domain Prediction

Page 18: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

MvQ: MSE Estimation

I Construct B bootstrap populations

I For each population b compute the population indicators, θ∗bkI From each bootstrap population select a bootstrap sample

I Implement the MvQ with the bootstrap sample, get θ∗b

MSE (θk) = B−1B∑

b=1

(θ∗bk − θ∗bk )2

Schmid, Tzavidis, Salvati and Weidenhammer 18 (31) Microsimulation via Quantiles for Domain Prediction

Page 19: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Discrete Outcomes - Counts

I Goal: Estimate the distribution function of y

I BUT: Quantiles of discrete y are not continuous

I Jittering: Machado & Santos Silva (JASA, 2005)

I Impose smoothness by adding to y random noise from adistribution F with support in (0, 1)

I Example: F ∼ Uniform(0, 1)

I 1− 1 relationship between the quantiles of the count andthose of the jittered outcome

I Nested error model for quantiles of jittered outcome

I Estimates of the quantiles of discrete y obtained viaappropriate backtransformation

Schmid, Tzavidis, Salvati and Weidenhammer 19 (31) Microsimulation via Quantiles for Domain Prediction

Page 20: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Theory - Consistency (Weidenhammer, 2016)For a fixed q ∈ (0, 1) the quantile estimator

Qy |x ,k(q|x , k) = xTikβq + vk k = 1, ...,D, and i = 1, ..., nk

is consistent for D →∞ and nk →∞.

Proof (Steps):

1. Consistency of the parameter estimates θ = (βq, σq, σ2v ,q)

- Define θ to be the maximum likelihood estimator of θ- Showing consistency of θ relies on Weiss (1971, 1974)- The construction of the proof follows Miller (1977) & Pinheiro

(1994)- Two conditions for showing asymptotic normality θ

2. Consistency of the random effect vk

Schmid, Tzavidis, Salvati and Weidenhammer 20 (31) Microsimulation via Quantiles for Domain Prediction

Page 21: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Theory - Consistency (Step 1)

Let `(θ|y) be the likelihood of the observed data y and θ0 is theunknown value of θ

I Assumption 1: X

− 1

K (n)

∂2

∂θi∂θj`(θ|y)

∣∣∣∣θ0

P→ B(θ0)

where B(θ0) is a positive definite matrix

I Assumption 2: XConvergence in 1 has to be at a certain rate for all θ in aneighborhood of θ0

Schmid, Tzavidis, Salvati and Weidenhammer 21 (31) Microsimulation via Quantiles for Domain Prediction

Page 22: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Overview

Motivation

Microsimulation via Quantiles

Empirical studiesModel-based simulations

Schmid, Tzavidis, Salvati and Weidenhammer 22 (31) Microsimulation via Quantiles for Domain Prediction

Page 23: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Model-based simulations

Population data is generated for D = 50 areas with N = 200 via

yik = 4500− 400xik + uk + εik

I Covariates xik ∼ N(µk , 32) with µk ∼ U(−3, 3)

I Random effects uk ∼ N(0, 5002)

I Unbalanced design leading to a sample size of n = 921(min = 8, mean = 18.4, max = 29)

I 100 Monte Carlo replicates with B=100 bootstraps

Scenarios:

I Normality: εik ∼ N(0, 10002)

I Contamination: εik ∼ 0.98N(0, 10002) + 0.02N(0, 60002)

I Heteroscedasticity: εik = (1 + 0.1xik)e with e ∼ N(0, 10002)

Schmid, Tzavidis, Salvati and Weidenhammer 23 (31) Microsimulation via Quantiles for Domain Prediction

Page 24: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Estimating the Domain Distribution - Normality

Income

Den

sity

0.00000

0.00005

0.00010

0.00015

0.00020

0.00025

0 5000 10000

Area with n=20 Area with n=29

Area with n=8

0 5000 10000

0.00000

0.00005

0.00010

0.00015

0.00020

0.00025

Area with n=15

Estimated Underlying

Schmid, Tzavidis, Salvati and Weidenhammer 24 (31) Microsimulation via Quantiles for Domain Prediction

Page 25: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Estimating the Domain Distribution - Contamination

Income

Den

sity

0.00000

0.00005

0.00010

0.00015

0.00020

0.00025

−5000 0 5000 10000 15000

Area with n=20 Area with n=29

Area with n=8

−5000 0 5000 10000 15000

0.00000

0.00005

0.00010

0.00015

0.00020

0.00025

Area with n=15

Estimated Underlying

Schmid, Tzavidis, Salvati and Weidenhammer 25 (31) Microsimulation via Quantiles for Domain Prediction

Page 26: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Quality Measures

Root mean square error:

RMSEk =

√√√√ 1

R

R∑r=1

(θk,r − θk,r

)2Relative bias [%]:

RBk =1

R

R∑r=1

θk,r − θk,rθk,r

· 100

Absolute bias [%]:

Biask =1

R

R∑r=1

θk,r − θk,r

Target parameters: Gini coefficient (Gini) and 25% quantile of the smallareas

Schmid, Tzavidis, Salvati and Weidenhammer 26 (31) Microsimulation via Quantiles for Domain Prediction

Page 27: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Performance under Normality0.

010.

020.

030.

040.

05

MvQ EBP normal direct

● ●

●●

RMSE Gini

−0.

03−

0.02

−0.

010.

00

MvQ EBP normal direct

●●

●●

Absolute Bias Gini

200

300

400

500

600

● ●

RMSE Quant025

−2

02

46

8

● ●

●●●

Relative Bias Quant025

Schmid, Tzavidis, Salvati and Weidenhammer 27 (31) Microsimulation via Quantiles for Domain Prediction

Page 28: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Performance under Contamination0.

020.

030.

040.

050.

060.

070.

08

MvQ EBP normal direct

RMSE Gini

−0.

03−

0.02

−0.

010.

000.

010.

02

MvQ EBP normal direct

Absolute Bias Gini

200

300

400

500

600

RMSE Quant025

−5

05

10

●●

●●

Relative Bias Quant025

Schmid, Tzavidis, Salvati and Weidenhammer 28 (31) Microsimulation via Quantiles for Domain Prediction

Page 29: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Performance under Heteroscedasticity0.

020.

030.

040.

050.

06

MvQ EBP normal direct

●●

●●

RMSE Gini

−0.

03−

0.02

−0.

010.

00

MvQ EBP normal direct

●●

Absolute Bias Gini

200

300

400

500

600

700

●●

●●

RMSE Quant025

−2

02

46

8

●●

●●

Relative Bias Quant025

Schmid, Tzavidis, Salvati and Weidenhammer 29 (31) Microsimulation via Quantiles for Domain Prediction

Page 30: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Remarks and next steps

Remarks:

I Methodology can be used for domain estimation with countoutcomes.

I Proposed bootstrap approach (count and continuous case)reveal promising results.

Next steps:

I Constrained quantile regression to avoid quantile crossing.

I Relax parametric assumptions regarding the random effects.

Schmid, Tzavidis, Salvati and Weidenhammer 30 (31) Microsimulation via Quantiles for Domain Prediction

Page 31: A Unit-level Quantile Nested Error Regression Model for ... · Overview Motivation Microsimulation via Quantiles Empirical studies Model-based simulations Schmid, Tzavidis, Salvati

MotivationMicrosimulation via Quantiles

Empirical studies

Thank you very much for your attention.

Timo SchmidNikos TzavidisBeate WeidenhammerNicola Salvati

Schmid, Tzavidis, Salvati and Weidenhammer 31 (31) Microsimulation via Quantiles for Domain Prediction