Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Jelke Bethlehem
The ever changing landscape of survey research
Overview
Survey research developments through the ages
• From census to survey.
• The fundamental principles of probability
sampling.
• The rise of computer-assisted interviewing
• The conquest of the web.
• The eternal problem of nonresponse.
• The future: are surveys forever?
2
From census to survey
Even the old empires needed statistical information
• Initially always complete enumeration (census).
• China an Egypt (1000 BC): overviews for taxation and military
affairs.
• Roman Empire (8 BC): counts of
people and their possessions.
• Example:
Census in Bethlehem,
Pieter Bruegel, 1566.
3
From census to survey
The Domesday Book
• Commissioned in 1086 by William
the Conqueror after he conquered
England from Normandy in 1066.
• Compiled by royal commissioners.
• Data about 13,000 places,
10,000 facts per county.
• Data about landowners, slaves,
free people, woodland, pasture,
mills, fish ponds, estimated
value of the property.
4
From census to survey
The Quipucamayoc
• Statistician in the Inca Empire
(1000-1500 AD).
• A Quipucamayoc in each district.
• Count of people, young man, houses,
llama’s.
• Recorded on quipu’s.
• Knots in coloured ropes, decimal system.
• New acronym: RAPI
(Rope Assisted Personal Interviewing).
5
From census to survey
The first modern censuses
• Standardized questionnaires.
• Legal obligation to participate.
• New France (Canada): 1666,
Jean Talon, N=3215.
• Sweden: 1748,
Denmark: 1769.
• Netherlands: 1795,
new system of electoral
constituencies in the Batavian
Republic.
6
From census to survey
Why no sampling until 1895?
• Replacing some people by computations was seen as a form of
discrimination.
• No reliable conclusions possible based on sample data.
The dawn of a new era
• Industrialization.
• Urbanization.
• Population growth.
• Centralized government.
• Need for information.
7
From census to survey
New developments
• 1895: Anders Kiaer proposes his ‘Representative
Method’. A kind of quota sampling. He could not
compute de accuracy of estimates.
• 1906: Arthur Bowley proposes random sampling.
Probability Theory can be applied. Estimators have
normal distribution. Variances can be computed.
• 1934: Jerzy Neyman introduces the confidence
interval. He also shows that quota sampling
(purposive sampling) does not work.
8
From census to survey
The fundamental principles of sampling
• Samples must be selected by means of probability sampling.
• Every element must have a positive probability of selection.
• All selection probabilities must be known.
Consequences
• It is always possible to construct an unbiased estimator.
• Estimators often have a (approximately) normal distribution.
• Accuracy of estimators can be computed (confidence intervals).
Consequences
• For other forms of sampling (e.g. quota
sampling), it is not clear how reliable and
valid the outcomes are.
9
From census to survey
Developments in the Netherlands
• First experiments with Representative Method in 1923.
• First real survey around 1947 (income statistics).
• Most important sampling frame:
population register.
• Systematic sampling instead of random
sampling.
• Two-stage sampling: first municipalities,
then people in selected municipalities.
• Everything manually.
• Collecting and processing survey data
was cumbersome.
10 Population register, 1946
The rise of microcomputers
From mainframe to micro
• 1984: Data Editing Research Project. Conclusion: data processing is
costly and time-consuming. A different approach is needed.
• The mainframe computer was replaced by a network of
microcomputers.
• Computer-assisted data entry by subject-matter specialists.
• First experiments with laptops for face-to-face interviewing (CAPI).
11
The rise of microcomputers
The Teleac Statistics Course (1986)
• TV-course about statistics.
• Substantial contributions by Statistics Netherlands.
• Fragment about sampling frames, laptops and nonresponse
12
The rise of microcomputers
Computer-assisted interviewing
• Faster data processing: data already entered during the interview.
• Better data: error checking and correction during the interview.
• Easier for the interviewers: route enforced by computer.
• First experiments in 1984.
• First survey in 1987 (Labour Force Survey).
13
The rise of microcomputers
The Blaise System for computer-assisted interviewing
• Developed by Statistics Netherlands.
• De-facto standard in the world of official statistics.
• CAPI (face-to-face interviewing) in 1987 (MS-DOS):
14
The rise of microcomputers
The Blaise System for computer-assisted interviewing
• CATI (telephone interviewing) in 1990.
• From MS-DOS to Windows in 1998.
15
The rise of microcomputers
Blaise around the world
16
1989 Basque Country
1990 East Berlin, DDR.
1990 Belgrade, Yugoslavia
2001 WashingtonUSA
The conquest of the web
The World Wide Web
• 1983: First experiments with e-mail surveys.
• 1989: Tim Berners-Lee introduces HTML, World Wide Web, first
browsers.
• 1995: HTML 2.0, data transfer from user to server, support for forms,
first web questionnaires.
17
The conquest of the web
The popularity of web surveys
• Easy: simple access to large group of potential respondents.
• Cheap: no interviewers, no printing, no mailing.
• Fast: a survey can be launched very quickly.
• Everybody can do it!
The methodological challenges of web surveys
• Under-coverage
• Sample selection
• Measurement errors
• Nonresponse
18
The conquest of the web
Under-coverage
• Not every person in the
population has access to
internet.
• Those without internet may
differ from those with internet.
• People without internet will
never be selected for a web
survey.
• Therefore the sample is not
representative.
• Web survey based estimates
may be biased!
19
The conquest of the web
Internet coverage in Europe
20
Top 3: Iceland (96%) Netherlands (95%) Norway (94%) Bottom 3: Greece (56%) Bulgaria (54%) Turkey (49%) Source: Eurostat, 2013
The conquest of the web
The sample selection problem
• How to select a random sample if there is no sampling frame with e-
mail addresses?
Possible solution
• Use another mode for sample recruitment, for example mail or
telephone. This makes the survey more expensive.
• Use the respondents of another (telephone or face-to-face) survey.
This produces samples that lack representativity.
• Draw the sample from a web panel. This creates a new problem: how
to get a representative web panel?
• Select the sample by means of self-selection. There is no probability
sampling. People spontaneously decide to take part in the survey.
21
The conquest of the web
The dangers of self-selection
• Also people outside the target
population of the survey can
respond.
• Often people can respond
more than once.
• Groups of people may attempt
to manipulate the outcomes of
the survey.
22
Local elections in Amsterdam. Who won the debate (Jan. 2014)?
The conquest of the web
Measurement errors in web surveys
• Respondents are not interested in the topic of the survey.
• There are no interviewers to assist them in answering questions.
• Participating is not important for them.
• They do not read the questions, but scan through them looking for
words and phrases that catch they eye.
• They know there is no penalty for giving a wrong answer.
• They do not figure out how the questionnaire works. They just
muddle through until they reach the end.
Satisfying
• Respondents do not give the optimal answer, but the first more or
less acceptable answer that comes into mind.
23
The nonresponse problem
Nonresponse in surveys
• Persons who are selected in the sample (and who belong to the
target population) do not provide the requested information.
Causes of nonresponse
• No contact
• Refusal
• Not able
Consequence of nonresponse
• Less observations
• Representativity may be affected
• Wrong conclusions are drawn.
24
The nonresponse problem
Nonresponse problems increase over time
• Response rates of the Labour Force Survey (EBB):
25
The nonresponse problem
Nonresponse bias
• The magnitude of the nonresponse bias is equal to:
26
SSRyB
YY
,)(
The nonresponse problem
Nonresponse bias
• The magnitude of the nonresponse bias is equal to:
27
SSRyB
YY
,)(
Correlation between response behaviour and survey variable
The nonresponse problem
Nonresponse bias
• The magnitude of the nonresponse bias is equal to:
28
SSRyB
YY
,)(
Correlation between response behaviour and survey variable
Variation of response probabilities
The nonresponse problem
Nonresponse bias
• The magnitude of the nonresponse bias is equal to:
29
SSRyB
YY
,)(
Correlation between response behaviour and survey variable
Variation of response probabilities
Response rate
The nonresponse problem
Nonresponse research in the last 30 years
• The Basic Question Approach.
• Advanced correction techniques (linear weighting).
• Advanced weighting software: Bascula.
• Many auxiliary variables available through System of Social
Statistical Datasets (SSD).
• Focus on the concept of response probability.
• The R-indicator for survey response quality.
• Adaptive/responsive survey design.
• Mixed-mode surveys.
But …
• The nonresponse problem is still not solved.
30
Where are we now?
Budget cuts
• Can we change from face-to-face and telephone surveys to web
surveys without sacrificing quality?
Sampling frames
• There are no proper sampling frames for web surveys.
• It becomes more and more difficult to select a sample for a telephone
survey.
Increasing nonresponse problems
• Response rates < 40% for web surveys (everywhere).
• Response rates < 10% for telephone surveys (RDD, US).
• Do the principles of probability sampling still apply?
31
Where should we go?
Is there a future for surveys?
• How to collect data in the future?
32
Where should we go?
Solution 1:
• Abandon probability sampling.
• It is much easier to collect data with self-selection surveys.
• Do not worry about lack of representativity. You can always correct
this later by applying adjustment weighting.
• Ultimate solution: set up a large self-selection web panel
However:
• The representativity problems of self-selection surveys are much
bigger than those of probability surveys + nonresponse.
• Is it really possible to remove the bias of the estimates? Not, if
specific subpopulations are missing completely.
33
Where should we go?
Solution 2:
• Abandon probability sampling (design-based estimation).
• Go for model-based estimation.
• Try to find models for the relation between variables in the
population (superpopulations).
• If you have a good model you do not need random sampling any
more.
However:
• Can we find such models?
• Do these models remain valid over time?
• Do they have sufficient explanatory power?
• Need for auxiliary variables.
34
Where should we go?
Solution 3:
• Abandon surveys.
• Rely on available big data sets.
• If a data set is not representative, correct it by means of weighting or
an additional survey.
However:
• Big data sets do not imply good statistics!
• We do not need big data! We need representative data!
• Example: the Street Bump App in Boston (potholes, FT, 2014).
• Can we make all statistics we need with big data?
• Are correction techniques effective? We probably need more
auxiliary variables.
35
Where should we go?
Solution 4:
• Continue with probability surveys.
• There will always be surveys/polls, particularly during election
campaigns.
• Invest in better correction techniques.
• Try to find better auxiliary variables.
• Statistics Netherlands should extent SSD.
• Include other variables: psychographic, webographic variables.
• Make it available to every researcher.
36
Where should we go?
My advice:
• Do not throw out the baby with the bath water
37
A life full of statistics
The first encounter with statistics (1969)
• Contest for young scientists and inventors.
• Spectral sensitivity of insects eyes.
38
A life full of statistics
Methodological research and tools go together.
• Software development at Mathematical Centre.
• Teletype (1970), punched cards (1974), ALGOL 60.
39
A life full of statistics
Methodological research and tools go together.
• Software development at Mathematical Centre and CBS
• Programmable calculator (MC), first microcomputers (CBS).
40
A life full of statistics
A life full of travel and meeting colleagues
41
Compstat 1980, Edinburgh
ISI, Madrid, 1983, Leslie Kish
A life full of statistics
A life full of travel and meeting colleagues
42
Washington, 1987, Blaise
Nieuw-Zeeland, 1989, coding
A life full of statistics
A life full of travel and meeting colleagues
43
ISI, Cairo, 1991, Anota
Zimbabwe, 1993, UN, Blaise
A life full of statistics
A life full of travel and meeting colleagues
44
ISI, Durban, 2009, Nonresponse
Italy, 2010, Web surveys, Silvia Biffignandi
A life full of statistics
A life full of travel and meeting colleagues
45
Lausanne, 2011, nonresponse course (with Ineke Stoop)
Korea, 2011, survey methodology course (with Deirde Giesen)
Finally
Something to read (in Dutch)
• There are surveys everywhere.
• Often things go wrong with surveys.
• There still is a lot of work to do for su rvey
survey methodologists.
• Many examples in my columns
for M-zine, the newsletter of the
methodology department.
46