A Bayesian analysis of survey design parameters€¦ · Symposium StatCan, March 22-24, 2016,...

Symposium StatCan, March 22-24, 2016, Ottawa

A Bayesian analysis of survey design parameters

Lisette Bruin, Nino Mushkudiani, Barry Schouten

Summary

• Introduction;

• Adaptive survey design

• BADEN

• Objectives;

• Design (definitions, notations, model);

• Bayesian analysis (general approach, priors and posteriors);

• Simulation study (goals, approach, results);

• Future work/discussion

Bayesian Adaptive survey DEsign NetworkInternational network on targeted data collection strategies employing

historical survey data

Jan 2015 – Dec 2017, funded by The Leverhulme Trust

• Participating institutes:

• Universities of Manchester (PI), Michigan and Southampton,

• CBS, RTI, Stat Sweden, US Census Bureau;

• Goals:

• Add expert knowledge and knowledge from historic survey

data in monitoring and analysis (phase 1);

• Adjustment/adaptation (phase 2);

Adaptive survey design

Adaptive survey designs differentiate survey design features for different

population subgroups based on auxiliary data about the sample obtained

from frame data, registry data or paradata.

Instead of a single (uniform) strategy multiple candidate strategies can be

Why adaptive survey designs?

• Response: persons have different preferences for communication and

interview, i.e. respond differently to different data collection strategies;

• Costs: different strategies are associated with different costs per

person;

Objectives

• To set up a general model for survey design parameters;

• To introduce a Bayesian analysis of survey design parameters;

• To introduce a Bayesian analysis of quality and cost indicators

based on survey design parameters;

Survey design parameters

Three sets of survey design parameters suffice to compute most of

the quality and cost constraints:

• 𝜌𝑖(𝑠1,𝑇) : Response propensities per unit per strategy;

• 𝐶𝑖(𝑠1,𝑇) : Expected costs per sample unit per strategy;

• 𝐷𝑖 𝑠1,𝑇 : Adjusted mode effects per unit per strategy;

We restrict to nonresponse error and leave the adjusted mode effects

to future papers.

Functions of survey design parameters

We consider three functions of the design parameters:

• the response rate

𝑅𝑅(𝑠1,𝑇) =1

𝑖=1

𝑑𝑖𝜌𝑖 (𝑠1,𝑇)

• the total cost

𝐵(𝑠1,𝑇) =

𝑖=1

𝑐𝑖 (𝑠1,𝑇)

• the coefficient of variation

𝐶𝑉 𝑋, 𝑠1,𝑇 =

1𝑁 𝑖=1𝑛 𝑑𝑖(𝜌𝑖 𝑠1,𝑇 − 𝑅𝑅(𝑠1,𝑇))

𝑅𝑅(𝑠1,𝑇)

Definitions

• Actions

• Choices for design features (number of calls, use of incentive,

interview mode)

• Strategy

• The total of choices made for the design features, denoted by

𝑠1,𝑇• Phase

• 𝑇 phases of survey design 𝑡 = 1,2, … , 𝑇• Auxiliary data

• A vector 𝑥𝑖 that is linked from frame data, administrative data

(𝑥0,𝑖) or paradata (𝑥𝑡,𝑖)

If 𝑥𝑖 = 𝑥0,𝑖, then the ASD is static. If for some 𝑡, 𝑥𝑡,𝑖 is used to choose

actions in a subsequent phase, then the ASD is dynamic.

Modeling survey design parameters

A simple, but sufficiently general model including all potential features:

• more than 1 phase

• dynamic

• dependency on history of actions

• non-eligible nonresponse for follow-up

Modeling:

1. Decomposition of model parameters into their main components

2. General linear models that link these components to the available

auxiliary variables

3. Assumption that cost, contact and participation per sample unit are

independent of those of other sample units

Bayesian analysis

General approach:

1. Assume independency of parameters;

2. Assign prior distributions;

3. Derive likelihood functions;

4. Derive approximations to posterior distributions of design

parameters;

5. Derive approximations to posterior distributions of aggregate

quality and cost measures (functions of design parameters).

Parameters in prior distribution (hyperparameters):

• Expert knowledge

• Historic survey data

Bayesian analysis

Posterior distributions

Joint posterior distributions of interest:

1. Individual response propensities and costs – optimization parameters

2. Overall quality and cost indicators – monitoring analysis

Required observed data:

• Realized costs

• Response outcomes

• Used strategies

• Auxiliary data

Bayesian analysis

Posterior distributions

No closed forms: Posterior distributions of response propensities and costs

(and overall quality and cost indicators) do not have closed forms.

Proposal: Draw MCMC samples from the posterior distributions of the

regression parameters in the contact, participation and cost models.

Advantage: Posterior distributions of overall quality and cost indicators follow

directly from the samples.

Obvious choice: Gibbs sampler to iterate draws for each parameter

separately (some conditional distributions still without closed forms)

Simulation study

Analyse the impact of:

- Misspecified prior distributions;

- Dispersion of prior distributions (non-informative vs informative);

- Sample size;

Additionally, investigate:

- Convergence properties and computation times;

Simulation study

Simulation

- Three phases: CAWI → CAPI3 → CAPI3+

- Simulation based on known parameters from the Health Survey

Decomposition

• Response per phase: 𝜌𝑡,𝑖 𝑠1,𝑡 = 𝜅𝑡,𝑖 𝑠1,𝑡 ⋅ 𝜆𝑡,𝑖 𝑠1,𝑡

• Costs per phase: 𝐶𝑡,𝑖(𝑠1,𝑡)= 𝐶0,𝑡,𝑖 𝑠1,𝑡 + 𝐶𝑅,𝑡,𝑖 𝑠1,𝑡 + 𝐶𝑁𝑅,𝑡,𝑖(𝑠1,𝑡)

Models

• Contact and participation: probit regression on age, gender and 0-1

indicator for web break-off;

• Costs: linear regression on age and gender;

Simulation study

Prior distributions (hyperpriors):

• Inverse Gamma: variance parameters in error terms of cost

functions;

• Normal distribution: all other regression parameters;

Posterior distributions:

• Approximated using a Gibbs sampler with data augmentation;

Simulation study

Chosen priors:

• ‘True’;

Computed and estimated with regression from original response

• Misspecified;

‘True’ prior with the mean of 𝛼0 multiplied by 3

• Non-informative;

Same standard deviations as ‘true’ prior, but equally distributed

• Non-informative with a larger variance;

Standard deviation multiplied by √10

Simulation study - results

True prior vs misspecified prior (𝜇𝛼0 multiplied by 3)

• A misspecified prior has no impact in the first phase with one covariate

• The impact of a misspecified prior is larger in a smaller dataset

• In a smaller dataset the variances of the posterior distributions are larger17

True prior vs non-informative priors

• A non-informative prior has no clear impact in the first phase with one covariate

• Smaller differences with response rate when the variances are larger

• The R-indicator is smaller for non-informative priors with a larger variance18

Conclusions

Misspecified prior distributions

• Larger impact in a smaller dataset

Non-informative vs informative

• The R-indicator is larger for non-informative priors with the same variance

• The R-indicator is smaller for non-informative priors with a large variance

• Smaller differences with true prior or response rate when the variance is

larger

Sample size

• Impact on misspecified priors and posterior variance

Convergence

• Short burn-in period

Computation times

• Larger computation times with larger variances and misspecified priors

Future work/discussion

Future work

Priors

• Translation of expert knowledge and historic survey data to

hyperparameters in prior distributions;

• Use of power priors to moderate impact of history;

Models

• Costs model;

• Dependency over phases (same mode in different phases);

• Inclusion of models for survey outcome variables and mode effects;

A Bayesian analysis of survey design parameters€¦ · Symposium StatCan, March 22-24, 2016,...

Documents

Bayesian inference and Bayesian model selection · • specify a joint probability distribution over all variables (observations and parameters) • require a likelihood function

Bayesian inference of non-linear multiscale model parameters … · 2020. 6. 23. · Bayesian inference of non-linear multiscale model parameters accelerated by a Deep Neural Network

Bayesian Estimation of Disease Prevalence and the ... · Bayesian Estimation of Disease Prevalence and the Parameters of Diagnostic Tests in the Absence of a Gold Standard Lawrence

Using Bayesian Growth Models to Predict Grape Yield · calendar days? • Secondly, would factors like temperature, rainfall, or solar radiation impact the parameters of the Bayesian

PROC MCMC Application-Time Series Forecasting Submissionmade using these methods. In contrast, the Bayesian approach is to address unknown parameters (or uncertainty about parameters)

Lecture 8: Bayesian Estimation of Parameters in State Space …ssarkka/course_k2016/handout8.pdf · Simo Särkkä Lecture 8: Bayesian Estimation of Parameters. Maximum a posteriori

A Bayesian Approach to Predict Solubility Parameters€¦ · A Bayesian approach to predict solubility parameters Benjamin Sanchez Lengeling 1, Lo c M. Roch , Jos e Dar o Perea2,

Lectures 3: Bayesian estimation of spatial regression models · ALL Bayesian econometric theory in 2 overheads In econometrics we care about 1) inference about parameters, 2) model

rSMILE, an interface to the Bayesian Network package GeNIe ... · Prentice Hall, 2003. ... Error/Exception Handling Static parameters for JVM ... an interface to the Bayesian Network

An intro to ABC – approximate Bayesian computationAn intro to ABC – approximate Bayesian computation ... exact inference for model parameters , nor it is possible to approximate

Bayesian Networks Practice - SNU · Create and modify network models with a graphical editor Building Bayesian networks, influence diagrams Learning structures and parameters of Bayesian

Bayesian Selection of Regularisation parameters: Theory ... · PDF fileBayesian Selection of Regularisation parameters: Theory, Methods and Algorithms Marcelo Pereyra mp12320/ School

Stanford University...tion and maximum likelihood, Bayesian, and approximate Bayesian inference of parameters are absent. Although these topics are fairly new to population genetics,

Bayesian structured additive distributional regression · Bayesian structured additive distributional regression to estimate all parameters as functions of explanatory variables and

BAYESIAN KRIGING AND BAYESIAN NETWORK DESIGNsolution of a Bayesian prediction problem for the linear model parameters, but only under the assumption that the covariance parameters

Practical Bayesian Framework for Backpropagation Networksauthors.library.caltech.edu/13793/1/MACnc92b.pdf · Bayesian Framework for Backpropagation Networks 449 terms; and (2) parameters

Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

on supervised learning of bayesian network parameters - Complex

A Bayesian and - warwick.ac.uk · •(10) Bayesian partial pooling III: Latent trait MPT w/o correlation parameters (Jags) •(11) latent-class approach (Klauer, 2006) •All implemented

Bayesian Inference For The Calibration Of DSMC Parameters