Designing Household Surveys

Designing Household Surveys

Ec798 MA Global Development Capstone Course Lecture 3Dilip Mookherjee

OutlineMotivation: Why do Household Surveys?Develop check-list of steps in planning/thinking

about survey design Sample design issues (draw upon Deaton’s

book on Household Surveys, Chapter 1)

My own views and talking points: the kinds of issues that matter in practice but are not in survey design textbooks

Why Do Household Surveys?Everybody in the development area (governments,

aid agencies, academics) need to assess the impact of development programs and policies, to get a sense of ``what works’’ and what doesn’t

Strangely, this is something new among most development practitioners

Even the World Bank has set up program appraisal and impact evaluation as part of its routine operations only recently, so for most part their programs have never incorporated lessons from past experience

Household surveys of course have been around for far longer, going back to studies of living standards among the poor in the UK in the 1920s, and the Indian sample surveys in the 1940s: the World Bank has LSMS’s in many LDCs since the mid-80s

Purpose of those surveys was to get facts, on what per capita income, inequality, poverty really was in any given country at any given time

But they were rarely used as tools for evaluating development programs

Why are HH Surveys Important in Program Evaluation?What are the criteria by which we should

evaluate any given policy or program?Can look at effects on some macro variables

such as wage rates, employment levels, exports, or at a more micro-level at effects on communities or enterprises receiving assistance etc. (eg coop profits, roads and schools built in a community etc.)

Problems with all of these: don’t have any way to evaluate the impact directly on what we all care about ultimately: household living standards

Based on an implicit judgment that increased exports, profits, infrastructure etc of the region/enterprise/community do benefit households in some significant way

Data on exports/profits/infrastructure obtained from concerned communities/regional governments may be somewhat cooked to show results, so they need independent corroboration

And the distribution of benefits within the community or enterprise is also key to understanding impact on poverty reduction

Typically most enlightened aid agencies or governments using some kind of aggregative information on collective outcomes will concede that this is rather coarse, but that designing household surveys is difficult, expensive and time-consuming

Many agencies/policy-makers/media outlets do not even see the need for policy evaluation, or are happy to rely on personal experience/anecdotes/case studies/political reactions (eg Indian govt officials involved in implementing decentralization program)

Others don’t see the value of statistical information (Schelling’s distinction between statistical and personal lives)

Survey of What?Household surveys usually involve a sampling

frame to ensure that responses are representative of a larger population, and administer detailed questionnaires to those in the sample concerning their living standards and other variables that pertain to the program being evaluated

Attempt to obtain objective, factual information that is subject to minimal psychological, perceptual biases

Whats Wrong with Perception Surveys?One simple way may be to ask households

whether they think the program has been successful/good etc without going into details

Framing effects and inconsistencies documented by psychologists: these are magnified when perceptions are elicited

Problems of comparability across individuals/communities/countries: the importance of implicit benchmarks or standards that vary widely

Problems with Perception Surveys, contd.Responses subject to mood swings, which magnify

measurement errorProblems of credibility of the response: sometimes people

pass judgment without thinking that much, sometimes they pronounce judgment to `make a statement’ even if they don’t sincerely believe what they say

Often they are not careful to consider counter-factuals, or to unbundle programs from others they `associate’ with it (eg my privatization project in LA)

Sometimes its hard for them to assess policy impact without a detailed listing and quantification of channels of impact (psychological aggregation problems, eg Robyn Dawes’s evidence)

Level of Aggregation: Who Should be Surveyed?Household surveys also represent a way to collect

impact assessment data at a more dis-aggregated level than community or enterprise level information

Provides assessment of distributive impacts, besides incorporate behavioral responses by households (eg effects on work, savings, investment etc.) which is key to understanding of channels of impact, sources of possible bias, and ultimate effects on well-being

Is this (i.e., household level) disaggregated enough?

Arguments for Surveying Individuals Nowadays the argument is building for

surveying individuals rather than households because of:Interest in gender empowerment effects, effects on

childrenHouseholds are getting fragmented, subdivided

etc. and these processes of household division themselves need to be studied

Extensive increase in migrationHowever this is even more difficult and expensive

than surveying households (you have to track down and survey individuals)

Scope of SurveyDecide first what the purpose or scope of the survey

is: be specific about (a) the programs to be evaluated and (b) the range of effects to be measured

Be sensitive to the specific context or area where it is to be carried out

Carry out exploratory visits, read background literature, talk to people on the ground, in settling on all this

Estimate costs, and compare with budget available (money, time, manpower)

Be brutally realistic about what is feasible both logistically as well as intellectually

Scoping, contd.Look at datasets already available, and use

them as departure point (e.g. last LSMS in the country)

Build on those questionnaires, but then add the questions you need to appraise the specific programs and program effects, while deleting those you do not need

Conceptual Framework: The Role of TheoryMore often than not, survey design is not much

influenced by any theoryI believe theory is important at all stages: it is

necessary to frame the main hypotheses, select the key variables that need to be measured, think about household behavioral responses and resulting market/community/political effects that represent and influence channels of impact, anticipate key analytical and econometric problems you will face in making inferences from the data

Often its difficult to theorize a lot in advance of the actual survey or study, but at least you should try to state the key hypotheses or questions precisely, the variables that need to be measured, the regression specifications to be used, and the inferences that can be drawn from the exercise

This is part of the overall scoping exercise, an assessment of what you can feasibly expect to achieve, which should guide your survey design

Back-and-forth between theory and evidence, an endlessly iterative procedure

Various Theory-Data Interfaces/MethodologiesMost ambitious and audacious: structural modeling

and estimation (complete model of motivation of individual agents, their behavior, aggregation/equilibrium at market/community/regional level, followed by welfare analysis and policy simulations eg CGE models)

Intermediate: reduced form modeling (relate observable impacts to policy/program treatments, household and community characteristics, time dummies etc, without pinpointing exact causal pathways)

Low: descriptive/factual (report facts and avoid any inferences about the actual impact or how it happened)

Survey Frames and CoverageDecide on relevant population, sample size,

basis of drawing samplesContext matters a lot (eg urban versus rural)Power calculations (use prior information on

means and variances to calculate the sample size needed to achieve a given degree of power)

Cost also matters (eg optimal design trades off precision versus cost: if rural hh’s are dispersed and expensive to get to, then you want fewer of them)

Strata and ClustersOften stratify the population into areas or

groups and draw a stratified random sample via a two step procedure (eg sample villages, then households)

Logistically and financially advisable (eg simple random sample of rural households would be very difficult and expensive)

Stratification also ensures representation of particular groups (eg if you want to guarantee some landless and some big landlords in the sample)

Missing Groups/ResponsesIn practice, cannot implement perfect

representation of all groups (Deaton’s examples) and ways of correcting for these by re-weighting the sample

More important, be aware of biases that may creep in as a result of missing groups, some of which can be pretty subtle but profound (eg. the `dog that did not bark in the night’)

See if there are systematic coverage/non-response biases (eg regression of coverage on relevant characteristics)

Panel versus Cross-Section?A key question concerns timing of coverage: should

you carry out a one-time survey?This is easiest and cheapestBut then you are not able to see the impact over

time: like viewing a single snapshot rather than a film

It is difficult to assess impact from a one-time survey: have to rely on comparison of treatment and control areas, which typically raise thorny selection bias problems (e.g., most govt programs or NGOs do not randomly select treatment areas or households, but target them purposively)

If you carry out a study at two or more points of time, you can see `before-after’ changes, or impacts as they evolve over time

Repeated cross-sections or re-survey of the same set of households (a panel)?

Depends on whether the object of analysis is the community or the individual household

Typically it is the individual household, so a panel is highly desirable

Other Advantages of Panel DataCan view impact on individual households over time

(so can answer questions such as `how many people escaped poverty and how’?)

Reduce problems of inference substantially by controlling for unobserved household-level heterogeneity (farm-size productivity example, nutrition-education example)

Data that has been most instrumental in furthering academic research (PSID, ICRISAT, Progresa etc.)

Structure of Progresa intervention/data: treatment and control villages, then create a household panel within each

Problems with Panel DataCan sometimes increase measurement error

by focusing on temporal changesLogistically and financially difficultProblems of attrition: people move around a

lot!Intermediate solutions:

Revolving Panels (eg Mexican employment survey)

Recall data in one-time surveys (but then worry about recall errors)

Variables to be MeasuredHow to assess living standards or economic well-

being: income, consumption, assets?Income: problem of separating permanent from

transitory income, of seasonality, and of measuring income

Consumption: again hard to do in practice (eg recall period?), but sometimes there is no other way (eg nutrition, health or gender empowerment programs)

Assets: lower measurement error but difficult to translate into income or consumption or intra-household implications

LSMS Typical LSMS measures all of these:

Household demographics, housing, access to facilities, migration,

Consumption, education, health, anthropometrics, marriage

Employment, farming and livestock, non-farm enterprises

Credit and savingsRemittances and transfersSimple LSMS questionnaire runs into 80 pages

Practical ConsiderationsIn my experience 80 pages are too many: people’s

attention span and willingness to answer questions is limited to 3 hrs per sitting, in which you can administer at most a 30 page survey

Phrasing of questions very important (have to make sure the meaning is accurately conveyed, have to use local translators and local lingo): pilot the questionnaire and watch the subjects as they answer questions

Make questions as precise as possible, develop codes for qualitative answers

StepsListing of all relevant villages: random

selection of villages with appropriate stratification

Listing of all households in selected villages, based on a 10-minute door-to-door survey of attributes you will stratify on (eg land owned, education, gender, occupation etc.)

Select stratified random sample of households in each village

Pilot the questionnaire, two or three times successively

Train surveyors

Steps, contd.Manually check entered answers on an

ongoing basis, get errors corrected ASAPComputerization of data entry: select entry

personnel, supervise their work with self-checking algorithms

Convert data to STATA or SAS or SPSS, then create relevant variables, create basic descriptive stats, tables and identify data errors: go back to data entry people and if necessary paper returns and original surveyors, and clean data

Now you are ready to start work!

Documents

Designing Household Surveys