22
[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 195 195–217 11 Sampling Methods for Web and E-mail Surveys Ronald D. Fricker, Jr ABSTRACT This chapter is a comprehensive overview of sampling methods for web and e-mail (‘Internet- based’) surveys. It reviews the various types of sampling method – both probability and non- probability – and examines their applicability to Internet-based surveys. Issues related to Internet- based survey sampling are discussed, including dif- ficulties assembling sampling frames for probability sampling, coverage issues, and nonresponse and selection bias. The implications of the various survey mode choices on statistical inference and analyses are summarized. INTRODUCTION In the context of conducting surveys or collecting data, sampling is the selection of a subset of a larger population to survey. This chapter focuses on sampling methods for web and e-mail surveys, which taken together we call ‘Internet-based’ surveys. In our discussion we will frequently com- pare sampling methods for Internet-based surveys to various types of non-Internet- based surveys, such as those conducted by postal mail and telephone, which in the aggregate we refer to as ‘traditional’ surveys. The chapter begins with a general overview of sampling. Since there are many fine textbooks on the mechanics and mathematics of sampling, we restrict our discussion to the main ideas that are necessary to ground our discussion on sampling for Internet-based surveys. Readers already well versed in the fundamentals of survey sampling may wish to proceed directly to the section on Sampling Methods for Internet-based Surveys. WHY SAMPLE? Surveys are conducted to gather information about a population. Sometimes the survey is conducted as a census, where the goal is to survey every unit in the population. However, it is frequently impractical or impossible to survey an entire population, perhaps owing to either cost constraints or some other practical constraint, such as that it may not

Sampling Methods for Web and E-mail Surveys

Embed Size (px)

Citation preview

Page 1: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 195 195–217

11Sampling Methods for Web

and E-mail Surveys

R o n a l d D . F r i c k e r , J r

ABSTRACTThis chapter is a comprehensive overview ofsampling methods for web and e-mail (‘Internet-based’) surveys. It reviews the various types ofsampling method – both probability and non-probability – and examines their applicability toInternet-based surveys. Issues related to Internet-based survey sampling are discussed, including dif-ficulties assembling sampling frames for probabilitysampling, coverage issues, and nonresponse andselection bias. The implications of the various surveymode choices on statistical inference and analysesare summarized.

INTRODUCTION

In the context of conducting surveys orcollecting data, sampling is the selection ofa subset of a larger population to survey.This chapter focuses on sampling methodsfor web and e-mail surveys, which takentogether we call ‘Internet-based’ surveys.In our discussion we will frequently com-pare sampling methods for Internet-basedsurveys to various types of non-Internet-based surveys, such as those conducted

by postal mail and telephone, which inthe aggregate we refer to as ‘traditional’surveys.

The chapter begins with a general overviewof sampling. Since there are many finetextbooks on the mechanics and mathematicsof sampling, we restrict our discussion tothe main ideas that are necessary to groundour discussion on sampling for Internet-basedsurveys. Readers already well versed in thefundamentals of survey sampling may wishto proceed directly to the section on SamplingMethods for Internet-based Surveys.

WHY SAMPLE?

Surveys are conducted to gather informationabout a population. Sometimes the survey isconducted as a census, where the goal is tosurvey every unit in the population. However,it is frequently impractical or impossible tosurvey an entire population, perhaps owingto either cost constraints or some otherpractical constraint, such as that it may not

Page 2: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 196 195–217

196 THE SAGE HANDBOOK OF ONLINE RESEARCH METHODS

be possible to identify all the members of thepopulation.

An alternative to conducting a census isto select a sample from the population andsurvey only those sampled units. As shownin Figure 11.1, the idea is to draw a samplefrom the population and use data collectedfrom the sample to infer information aboutthe entire population. To conduct statisticalinference (i.e., to be able to make quantitativestatements about the unobserved populationstatistic), the sample must be drawn in such afashion that one can both calculate appropriatesample statistics and estimate their standarderrors. To do this, as will be discussed inthis chapter, one must use a probability-basedsampling methodology.

A survey administered to a sample canhave a number of advantages over a census,including:

• lower cost• less effort to administer• better response rates• greater accuracy.

The advantages of lower cost and lesseffort are obvious: keeping all else constant,reducing the number of surveys should costless and take less effort to field and analyze.However, that a survey based on a samplerather than a census can give better responserates and greater accuracy is less obvious.Yet, greater survey accuracy can result whenthe sampling error is more than offset bya decrease in nonresponse and other biases,perhaps due to increased response rates. Thatis, for a fixed level of effort (or funding), asample allows the surveying organization toput more effort into maximizing responsesfrom those surveyed, perhaps via more effortinvested in survey design and pre-testing,or perhaps via more detailed non-responsefollow-up.

What does all of this have to do withInternet-based surveys? Before the Internet,large surveys were generally expensive toadminister and hence survey professionalsgave careful thought to how to best conducta survey in order to maximize informationaccuracy while minimizing costs. However,

Population Sample

inference Samplestatistic

Unobserved populationstatistic

Figure 11.1 An illustration of sampling. When it is impossible or infeasible to observe apopulation statistic directly, data from a sample appropriately drawn from the population canbe used to infer information about the population

Page 3: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 197 195–217

SAMPLING METHODS FOR WEB AND E-MAIL SURVEYS 197

as illustrated in Figure 11.2, the Internetnow provides easy access to a plethoraof inexpensive survey software, as well asto millions of potential survey respondents,and it has lowered other costs and barriersto surveying. While this is good news forsurvey researchers, these same factors havealso facilitated a proliferation of bad survey-research practice.

For example, in an Internet-based surveythe marginal cost of collecting additional datacan be virtually zero. At first blush, this seemsto be an attractive argument in favour ofattempting to conduct censuses, or for sim-ply surveying large numbers of individualswithout regard to how the individuals arerecruited into the sample. And, in fact, theseapproaches are being used more frequentlywith Internet-based surveys, without muchthought being given to alternative samplingstrategies or to the potential impact suchchoices have on the accuracy of the surveyresults. The result is a proliferation of poorlyconducted ‘censuses’ and surveys based onlarge convenience samples that are likely toyield less accurate information than a well-conducted survey of a smaller sample.

Conducting surveys, as in all forms of datacollection, requires making compromises.Specifically, there are almost always trade-offs to be made between the amount of datathat can be collected and the accuracy ofthe data collected. Hence, it is critical forresearchers to have a firm grasp of the trade-offs they implicitly or explicitly make whenchoosing a sampling method for collectingtheir data.

AN OVERVIEW OF SAMPLING

There are many ways to draw samplesfrom a population – and there are alsomany ways that sampling can go awry.We intuitively think of a good sample asone that is representative of the populationfrom which the sample has been drawn. By‘representative’ we do not necessarily meanthe sample matches the population in termsof observable characteristics, but rather thatthe results from the data we collect fromthe sample are consistent with the results wewould have obtained if we had collected dataon the entire population.

Figure 11.2 Banners for various Internet survey software (accessed January 2007)

Page 4: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 198 195–217

198 THE SAGE HANDBOOK OF ONLINE RESEARCH METHODS

Of course, the phrase ‘consistent with’is vague and, if this was an exposition ofthe mathematics of sampling, would requirea precise definition. However, we will notcover the details of survey sampling here.1

Rather, in this section we will describe thevarious sampling methods and discuss themain issues in characterizing the accuracyof a survey, with a particular focus onterminology and definitions, in order thatwe can put the subsequent discussion aboutInternet-based surveys in an appropriatecontext.

Sources of error in surveys

The primary purpose of a survey is to gatherinformation about a population. However,even when a survey is conducted as a census,the results can be affected by several sourcesof error. A good survey design seeks to reduceall types of error – not only the samplingerror arising from surveying a sample of thepopulation. Table 11.1 below lists the fourgeneral categories of survey error as presentedand defined in Groves (1989) as part of his‘Total Survey Error’ approach.

Errors of coverage occur when some partof the population cannot be included in thesample. To be precise, Groves specifies threedifferent populations:

1 The population of inference is the populationthat the researcher ultimately intends to drawconclusions about.

2 The target population is the population ofinference less various groups that the researcherhas chosen to disregard.

3 The frame population is that portion of the targetpopulation which the survey materials or devicesdelimit, identify, and subsequently allow access to(Wright and Tsao, 1983).

The survey sample then consists of thosemembers of the sampling frame that werechosen to be surveyed, and coverage error isthe difference between the frame populationand the population of inference.

The two most common approaches toreducing coverage error are:

• obtaining as complete a sampling frame as pos-sible (or employing a frameless sampling strategyin which most or all of the target population hasa positive chance of being sampled);

• post-stratifying to weight the survey sampleto match the population of inference on someobserved key characteristics.

Sampling error arises when a sample of thetarget population is surveyed. It results fromthe fact that different samples will generatedifferent survey data. Roughly speaking,assuming a random sample, sampling error isreduced by increasing the sample size.

Nonresponse errors occur when data isnot collected on either entire respondents(unit nonresponse) or individual survey ques-tions (item nonresponse). Groves (1989) callsnonresponse ‘an error of nonobservation’. Theresponse rate, which is the ratio of the numberof survey respondents to the number sampled,is often taken as a measure of how wellthe survey results can be generalized. Higherresponse rates are taken to imply a lowerlikelihood of nonresponse bias.

Measurement error arises when the surveyresponse differs from the ‘true’ response.For example, respondents may not answersensitive questions honestly for a varietyof reasons, or respondents may misinterpretor make errors in answering questions.Measurement error is reduced in a variety ofways, including careful testing and revision of

Table 11.1 Sources of survey error according to Groves (1989)

Type of error Definition

Coverage ‘…the failure to give any chance of sample selection to some persons in the population’.Sampling ‘…heterogeneity on the survey measure among persons in the population’.Nonresponse ‘…the failure to collect data on all persons in the sample’.Measurement ‘…inaccuracies in responses recorded on the survey instruments’.

Page 5: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 199 195–217

SAMPLING METHODS FOR WEB AND E-MAIL SURVEYS 199

the survey instrument and questions, choice ofsurvey mode or modes, etc.

Sampling methods

Survey sampling can be grouped into twobroad categories: probability-based sampling(also loosely called ‘random sampling’)and non-probability sampling. A probability-based sample is one in which the respondentsare selected using some sort of probabilisticmechanism, and where the probability withwhich every member of the frame populationcould have been selected into the sample isknown. The sampling probabilities do notnecessarily have to be equal for each memberof the sampling frame.

Types of probability sample include:

• Simple random sampling (SRS) is a method inwhich any two groups of equal size in thepopulation are equally likely to be selected.Mathematically, simple random sampling selectsn units out of a population of size N such thatevery sample of size n has an equal chance ofbeing drawn.

• Stratified random sampling is useful whenthe population is comprised of a number ofhomogeneous groups. In these cases, it can beeither practically or statistically advantageous(or both) to first stratify the population into thehomogeneous groups and then use SRS to drawsamples from each group.

• Cluster sampling is applicable when the naturalsampling unit is a group or cluster of individualunits. For example, in surveys of Internet users itis sometimes useful or convenient to first sampleby discussion groups or Internet domains, andthen to sample individual users within the groupsor domains.

• Systematic sampling is the selection of everyk th element from a sampling frame or froma sequential stream of potential respondents.Systematic sampling has the advantage that asampling frame does not need to be assembledbeforehand. In terms of Internet surveying, forexample, systematic sampling can be used tosample sequential visitors to a website. Theresulting sample is considered to be a probabilitysample as long as the sampling interval does notcoincide with a pattern in the sequence beingsampled and a random starting point is chosen.

There are important analytical and practicalconsiderations associated with how one drawsand subsequently analyzes the results fromeach of these types of probability-based sam-pling scheme, but space limitations precludecovering then here. Readers interested insuch details should consult texts such asKish (1965), Cochran (1977), Fink (2003), orFowler (2002).

Non-probability samples, sometimes calledconvenience samples, occur when either theprobability that every unit or respondentincluded in the sample cannot be determined,or it is left up to each individual to chooseto participate in the survey. For probabilitysamples, the surveyor selects the sampleusing some probabilistic mechanism and theindividuals in the population have no controlover this process. In contrast, for example,a web survey may simply be posted on awebsite where it is left up to those browsingthrough the site to decide to participate in thesurvey (‘opt in’) or not. As the name implies,such non-probability samples are often usedbecause it is somehow convenient to do so.

While in a probability-based survey par-ticipants can choose not to participate inthe survey (‘opt out’), rigorous surveys seekto minimize the number who decide not toparticipate (i.e., nonresponse). In both cases itis possible to have bias, but in non-probabilitysurveys the bias has the potential to be muchgreater, since it is likely that those whoopt in are not representative of the generalpopulation. Furthermore, in non-probabilitysurveys there is often no way to assess thepotential magnitude of the bias, since there isgenerally no information on those who chosenot to opt in.

Non-probability-based samples oftenrequire much less time and effort, and thususually are less costly to generate, butgenerally they do not support statisticalinference. However, non-probability-basedsamples can be useful for research in otherways. For example, early in the courseof research, responses from a conveniencesample might be useful in developing researchhypotheses. Responses from conveniencesamples might also be useful for identifying

Page 6: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 200 195–217

200 THE SAGE HANDBOOK OF ONLINE RESEARCH METHODS

issues, defining ranges of alternatives, orcollecting other sorts of non-inferential data.For a detailed discussion on the applicationof various types of non-probability-basedsampling method to qualitative research, seePatton (2002).

Specific types of non-probability samplesinclude the following.

• Quota sampling requires the survey researcheronly to specify quotas for the desired numberof respondents with certain characteristics. Theactual selection of respondents is then left upto the survey interviewers who must match thequotas. Because the choice of respondents is leftup to the survey interviewers, subtle biases maycreep into the selection of the sample (see, forexample, the Historical Survey Gaffes section).

• Snowball sampling is often used when thedesired sample characteristic is so rare that it isextremely difficult or prohibitively expensive tolocate a sufficiently large number of respondentsby other means (such as simple random sampling).Snowball sampling relies on referrals from initialrespondents to generate additional respondents.While this technique can dramatically lower searchcosts, it comes at the expense of introducingbias because the technique itself substantiallyincreases the likelihood that the sample will notbe representative of the population.

• Judgement sampling is a type of convenience sam-pling in which the researcher selects the samplebased on his or her judgement. For example, aresearcher may decide to draw the entire randomsample from one ‘representative’ Internet-usercommunity, even though the population of interestincludes all Internet users. Judgment samplingcan also be applied in even less structuredways without the application of any randomsampling.

Bias versus variance

If a sample is systematically not representativeof the population of inference in some way,then the resulting analysis is biased. For exam-ple, results from a survey of Internet usersabout personal computer usage is unlikelyto accurately quantify computer usage inthe general population, simply because thesample is comprised only of those who usecomputers.

Taking larger samples will not correct forbias, nor is a large sample evidence of a lackof bias. For example, an estimate of averagecomputer usage based on a sample of Internetusers will likely overestimate the averageusage in the general population regardlessof how many Internet users are surveyed.Randomization is used to minimize the chanceof bias. The idea is that by randomly choosingpotential survey respondents the sample islikely to ‘look like’ the population, even interms of those characteristics that cannot beobserved or known.

Variance, on the other hand, is simply ameasure of variation in the observed data.It is used to calculate the standard error of astatistic, which is a measure of the variabilityof the statistic. The precision of statisticalestimates drawn via probabilistic samplingmechanisms is improved by larger samplesizes.

Some important sources of bias

Bias can creep into survey results in manydifferent ways. In the absence of significantnonresponse, probability-based sampling isassumed to minimize the possibility of bias.Convenience sampling, on the other hand, isgenerally assumed to have a higher likelihoodof generating a biased sample. However,even with randomization, surveys of andabout people may be subject to other kindsof bias. For example, respondents may beinclined to over-or understate certain things(‘sensitivity bias’), particularly with sociallydelicate questions (such as questions aboutincome or sexual orientation, for example).Here we just focus on some of the morecommon sources of bias related to sampling.

• Frame coverage bias occurs when the samplingframe misses some important part of thepopulation. For example, an e-mail survey usinga list of e-mail addresses will miss those withoutan e-mail address.

• Selection bias is an error in how the individual orunits are chosen to participate in the survey. It canoccur, for example, if survey participation dependson the respondents having access to particular

Page 7: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 201 195–217

SAMPLING METHODS FOR WEB AND E-MAIL SURVEYS 201

equipment, such as Internet-based surveys thatmiss those without Internet access.

• Size bias occurs when some units have a greaterchance of being selected than others. For example,in a systematic sample of website visitors, frequentsite visitors are more likely to get selected intothe sample than those that do not. In a similarvein, when selecting from a frame consisting ofe-mail addresses, individuals with multiple e-mailaddresses would have a higher chance of beingselected into a sample.

• Nonresponse bias occurs if those who refuse toanswer the survey are somehow systematicallydifferent from those who do answer it.

Historical survey gaffes

A famous example of a survey that reachedexactly the wrong inferential conclusion asa result of bias, in this case frame coverageand nonresponse bias, is the ‘Literary Digest’poll in the 1936 United States presidentialelection. As described in Squires (1988),for their survey ‘Literary Digest’ assembleda sampling frame from telephone numbersand automobile registration lists. While usingtelephone numbers today might result in afairly representative sample of the population,in 1936 only one in four households had atelephone and those were the more well-to-do.Compounding this, automobile registrationlists only further skewed the frame towardsindividuals with higher incomes.

‘Literary Digest’ mailed 10 million straw-vote ballots, of which 2.3 million werereturned, an impressively large number, butit represented less than a 25 percent responserate. Based on the poll data, ‘Literary Digest’predicted that Alfred Landon would beatFranklin Roosevelt 55 percent to 41 percent.In fact, Roosevelt beat Landon by 61 percentto 37 percent. This was the largest error evermade by a major poll and is considered to beone of the causes of ‘Literary Digest’s demisein 1938.

Gallup, however, called the 1936 presiden-tial election correctly, even though he usedsignificantly less data. But even Gallup, apioneer in modern survey methods, didn’talways get it right. In the 1948 United Statespresidential election between Harry S Truman

and Thomas E. Dewey, Gallup used a quotasampling method in which each pollster wasgiven a set of quotas of types of peopleto interview, based on demographics. Whilethat seemed reasonable at the time, thesurvey interviewers, for whatever consciousor subconscious reason, were biased towardsinterviewing Republicans more often thanDemocrats. As a result, Gallup predicted aDewey win of 49.5 percent to 44.5 percent:but almost the opposite occurred, with Trumanbeating Dewey with 49.5 percent of the popu-lar vote to Dewey’s 45.1 percent (a differenceof almost 2.2 million votes).2

SAMPLING METHODS FORINTERNET-BASED SURVEYS

This section describes specific types ofInternet-based survey and the sampling meth-ods that are applicable to each. We concentrateon differentiating whether particular samplingmethods and their associated surveys allow forgeneralization of survey results to populationsof inference or not, providing examples ofsome surveys that were done appropriatelyand well, and others that were less so.Examples that fall into the latter categoryshould not be taken as a condemnation ofa particular survey or sampling method,but rather as illustrations of inappropriateapplication, execution, analysis, etc. Couper(2000: 465–466) perhaps said it best,

Any critique of a particular Web survey approachmust be done in the context of its intended purposeand the claims it makes. Glorifying or condemningan entire approach to survey data collection shouldnot be done on the basis of a single implementation,nor should all Web surveys be treated as equal.

Furthermore, as we previously discussed,simply because a particular method does notallow for generalizing beyond the sample doesnot imply that the methods and resulting dataare not useful in other research contexts.

Similarly to Couper (2000), Table 11.2lists the most common probability and non-probability sampling methods, and indicateswhich Internet-based survey mode or modes

Page 8: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 202 195–217

202 THE SAGE HANDBOOK OF ONLINE RESEARCH METHODS

Table 11.2 Types of Internet-based survey and associatedsampling methods

Sampling method Web E-mail

Probability-basedSurveys using a list-based sampling frame ✓ ✓

Surveys using non-list-based random sampling ✓ ✓

Intercept (pop-up) surveys ✓

Mixed-mode surveys with Internet-based option ✓ ✓

Pre-recruited panel surveys ✓ ✓

Non-probabilityEntertainment polls ✓

Unrestricted self-selected surveys ✓

Surveys using ‘harvested’ e-mail lists (and data) ✓ ✓

Surveys using volunteer (opt-in) panels ✓

may be used with each method. For example,it is possible to conduct both web and e-mailsurveys using a list-based sampling framemethodology. Conversely, while it is feasibleto conduct an entertainment poll by e-mail,virtually all such polls are conducted via websurveys.

Surveys using a list-based samplingframe

Sampling for Internet-based surveys using alist-based sampling frame can be conductedjust as one would for a traditional surveyusing a sampling frame. Simple randomsampling in this situation is straightforwardto implement and requires nothing morethan contact information (generally an e-mailaddress for an Internet-based survey) on eachunit in the sampling frame. Of course, thoughonly contact information is required to fieldthe survey, having additional informationabout each unit in the sampling frame isdesirable to assess (and perhaps adjust for)nonresponse effects.

While Internet-based surveys using list-based sampling frames can be conductedeither via the web or by e-mail, if an all-electronic approach is preferred the invitationto take the survey will almost always bemade via e-mail. And, because e-mail listsof general populations are generally notavailable, this survey approach is mostapplicable to large homogeneous groups forwhich a sampling frame with e-mail addresses

can be assembled (for example, universities,government organizations, large corporations,etc). Couper (2000) calls these ‘list-basedsamples of high-coverage populations’.

In more complicated sampling schemes,such as a stratified sampling, auxiliary infor-mation about each unit, such as membershipin the relevant strata, must be available andlinked to the unit’s contact information. Andmore complicated multi-stage and clustersampling schemes can be difficult or evenimpossible to implement for Internet-basedsurveys. First, to implement without havingto directly contact respondents will likelyrequire significant auxiliary data, which isunlikely to be available except in the caseof specialized populations. Second, if (non-Internet based) contact is required, then theresearchers are likely to have to resort tothe telephone or mail in order to ensure thatsufficient coverage and response rates areachieved.

An example of multi-stage sampling proce-dure, used for an Internet-based survey of real-estate journalists for which no sampling frameexisted, is reported by Jackob et al. (2005).For this study, the researchers first assembled alist of publications that would have journalistsrelevant to the study. From this list a stratifiedrandom sample of publications was drawn,separately for each of five European countries.They then contacted the managing editorat each sampled publication and obtainedthe necessary contact information on allof the journalists that were ‘occupied with

Page 9: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 203 195–217

SAMPLING METHODS FOR WEB AND E-MAIL SURVEYS 203

real-estate issues’. All of the journalistsidentified by the managing editors were thensolicited to participate in a web survey.Jackob et al. (2005) concluded that it ‘takesa lot of effort especially during the phaseof preparation and planning’ to assemble thenecessary data and then to conduct an Internet-based survey using a multi-stage samplingmethodology.

Surveys using non-list-basedrandom sampling

Non-list-based random sampling methodsallow for the selection of a probability-basedsample without the need to actually enumeratea sampling frame. With traditional surveys,random digit dialing (RDD) is a non-list-basedrandom sampling method that is used mainlyfor telephone surveys.

There is no equivalent of RDD forInternet-based surveys. For example, it is notpossible (practically speaking) to generaterandom e-mail addresses (see the Issuesand Challenges in Internet-based SurveySampling section). Hence, with the exceptionof intercept surveys, Internet-based surveysrequiring non-list-based random samplingdepend on contacting potential respondentsvia some traditional means such as RDD,which introduces other complications andcosts. For example, surveyors must eitherscreen potential respondents to ensure theyhave Internet access or field a surveywith multiple response modes. Surveys withmultiple response modes introduce furthercomplications, both in terms of fieldingcomplexity and possible mode effects (again,see the Issues and Challenges in Internet-based Survey Sampling section).

Intercept surveys

Intercept surveys on the web are pop-up surveys that frequently use systematicsampling for every kth visitor to a websiteor web page. These surveys seem to be mostuseful as customer-satisfaction surveys ormarketing surveys. This type of systematicsampling can provide information that is

generalizable to particular populations, suchas those that visit a particular website/page.The surveys can be restricted to only thosewith certain IP (Internet Protocol) addresses,allowing one to target more specific subsets ofvisitors, and ‘cookies’ can be used to restrictthe submission of multiple surveys from thesame computer.

A potential issue with this type of survey isnonresponse. Coomly (2000) reports typicalresponse rates in the 15 to 30 percent range,with the lowest response rates occurringfor poorly targeted and/or poorly designedsurveys. The highest response rates wereobtained for surveys that were relevantto the individual, either in terms of theparticular survey questions or, in the caseof marketing surveys, the commercial brandbeing surveyed.

As discussed in Couper (2000), an impor-tant issue with intercept surveys is thatthere is no way to assess nonresponse bias,simply because no information is available onthose that choose not to complete a survey.Coomly (2000) hypothesizes that responsesmay be biased towards those who are moresatisfied with a particular product, brand, orwebsite; towards those web browsers whoare more computer and Internet savvy; and,away from heavy Internet browsers who areconditioned to ignore pop-ups.Another sourceof nonresponse bias for intercept surveysimplemented as pop-up browser windowsmay be pop-up blocker software, at least tothe extent that pop-up blocker software isused differentially by various portions of theweb-browsing community.

Pre-recruited panel surveys

Pre-recruited panel surveys are, generallyspeaking, groups of individuals who haveagreed in advance to participate in a series ofsurveys. For Internet-based surveys requiringprobability samples, these individuals aregenerally recruited via some means other thanthe web or e-mail – most often by telephoneor postal mail.

For a longitudinal effort consisting of aseries of surveys, researchers may recruit

Page 10: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 204 195–217

204 THE SAGE HANDBOOK OF ONLINE RESEARCH METHODS

panel members specifically for that effort. Forsmaller efforts or for single surveys, a numberof companies maintain panels of individuals,pre-recruited via a probability-based samplingmethodology, from which sub-samples canbe drawn according to a researcher’s speci-fication. Knowledge Networks, for example,recruits all of its panel members via telephoneusing RDD, and it provides equipment andInternet access to those that do not haveit in an attempt to maintain a panel thatis a statistically valid cross section of thepopulation (see Pineau and Dennis, 2004, foradditional detail).

Pre-recruited, Internet-enabled panels canprovide the speed of Internet-based sur-veys while simultaneously eliminating theoften-lengthy recruitment process normallyrequired. As such, they can be an attractiveoption to researchers who desire to fieldan Internet-based survey, but who require asample that can be generalized to populationsoutside of the Internet-user community.

However, pre-recruited panels are notwithout their potential drawbacks. In par-ticular, researchers should be aware thatlong-term panel participants may responddifferently to surveys and survey questionsthan first-time participants (called ‘panelconditioning’ or ‘time-in-sample bias’). Also,nonresponse can be an issue if the combinedloss of potential respondents throughout allof the recruitment and participation stagesis significant. However, as Couper (2000)concludes, ‘… in theory at least, this approachbegins with a probability sample of thefull (telephone) population, and assuming nononresponse error permits inference to thepopulation…’.

Entertainment polls

Internet-based entertainment polls are‘surveys’ conducted purely for theirentertainment value (though they aresometimes passed off to be more than whatthey are). On the Internet, they largelyconsist of websites where any visitor canrespond to a posted survey. An example ofan Internet-based entertainment poll is TheWeekly Web Poll (www.weeklywebpoll.

com). The telephone equivalent of these typesof polls are call-in polls (or cell-phone text-message polls) such as those advertised onvarious television shows, where viewers canvote for their favourite contestant or character.Of course, Internet-based entertainmentpolls are as unscientific as call-in telephonepolls.

Surveys using ‘Harvested’ e-mail lists

Harvested e-mail lists are sets of e-mailaddresses collected from postings on theweb and from individuals who are (wittinglyor unwittingly) solicited for their e-mailaddresses. There are many commercial enti-ties (‘e-mail brokers’) that sell lists ofe-mail addresses or access to lists of e-mailaddresses (just Google ‘buy e-mail list’).Lists can also be assembled from resourceson the web. For example, lists of Yahooe-mail address holders by name or geographicarea can be created by anyone via theYahoo! People Search (http://email.people.yahoo.com/py/). Similarly, World Email.com(www.worldemail.com) has an e-mail searchfeature by name.

However, it is important to note thatharvesting e-mail addresses and distribut-ing unsolicited e-mail related to surveyscould be a violation of professional eth-ical standards and/or illegal. For exam-ple, European Union Article 13(1) of thePrivacy and Electronic CommunicationsDirective prohibits the sending of unso-licited commercial e-mail. In a similar vein,the Council of American Survey ResearchOrganizations (CASRO) Code of Standardsand Ethics for Survey Research clearlystates that using harvested e-mail addressesand sending unsolicited e-mail is unethical(www.casro.org/codeofstandards.cfm). For amore detailed discussion of the ethicalconsiderations and implications, see Eynonet al., and Charlesworth (this volume), andKrishnamurthy (2002) and the referencescontained therein.

Samples derived from harvested e-maillists are non-probability samples becausethey are based on a convenience sampleof e-mail addresses. For example, Email

Page 11: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 205 195–217

SAMPLING METHODS FOR WEB AND E-MAIL SURVEYS 205

Marketing Blitz (www.email-marketing-blitz.com/customized_email_list.htm) says, ‘Ourtargeted optin [sic] email lists are updatedmonthly and gathered through special interestwebsites, entertainment websites and specialalliances.’ Such samples should not beconfused with list-based probability sampleswhere the e-mail addresses in the list-basedsample represent a (virtually) complete listof the e-mail addresses of some targetpopulation.

The efficacy of sending unsolicited surveysto a list of purchased or otherwise procuredlist e-mail addresses is questionable. Not onlydo e-mail addresses turn over quite frequently,but many of those on the list may have beenrecruited either without their knowledge, orthey may have inadvertently agreed by failingto uncheck a box when they signed up forsomething else. As a result, response rates arelikely to be extremely low.

Unrestricted self-selected surveys

As with entertainment polls, unrestricted, self-selected surveys are surveys that are opento the public for anyone to participate in.They may simply be posted on a website sothat anyone browsing through may choose totake the survey, or they may be promotedvia website banners or other Internet-basedadvertisements, or they may be publicizedin traditional print and broadcast media.Regardless of how they are promoted (or not),the key characteristics of these types of surveyare that there are no restrictions on who canparticipate, and it is up to the individual tochoose to participate (opt in).

For example, Berson et al. (2002) con-ducted a web-based survey ‘to better under-stand the risks to adolescent girls online’by posting a link to their survey on theSeventeen Magazine Online website. Via thesurvey, the authors collected data on 10,800respondents with ‘identified behaviours thatput them at risk’. The researchers were carefulto appropriately qualify their results:

The results highlighted in this paper are intended toexplore the relevant issues and lay the groundworkfor future research on youth in cyberspace. This is

considered an exploratory study which introducesthe issues and will need to be supplementedwith ongoing research on specific characteristicsof risk and prevention intervention. Furthermore,the generalizability of the study results to thelarger population of adolescent girls needs to beconsidered. Due to anonymity of the respondents,one of the limitations of the research design isthe possibility that the survey respondents didnot represent the experience of all adolescentgirls or that the responses were exaggerated ormisrepresented.

Unrestricted, self-selected surveys are a formof convenience sampling and, as such, theresults cannot be generalized to a largerpopulation. But as Berson et al. illustrate, thatdoes not necessarily negate their usefulnessfor research.

The web can also facilitate access toindividuals who are difficult to reach eitherbecause they are hard to identify, locate,or perhaps exist in such small numbersthat probability-based sampling would beunlikely to reach them in sufficient numbers.Coomber (1997) describes such a use ofthe web for fielding a survey to collectinformation from drug dealers about drugadulteration/dilution. By posting invitationsto participate in a survey on various drug-related discussion groups, Coomber col-lected data from 80 survey respondents(that he deemed reliable) located in 14countries on four different continents. Thesample was certainly not generalizable, butit also provided data that was unlikelyto be collected in any other way, andwhich Coomber found consistent with otherresearch.

In addition, Alvarez et al. (2002) proposedthat these types of non-probability samplecan be useful and appropriate for conductingexperiments (say, in the design of web pagesor web surveys) by randomly assigning mem-bers of the sample to control and experimentalgroups. In terms of psychology experiments,Siah (2005) states, ‘For experimental researchon the Internet, the advantage of yieldinga heterogeneous sample seems persuasiveconsidering that the most common criticismof psychological research is its over-relianceon college student samples.’

Page 12: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 206 195–217

206 THE SAGE HANDBOOK OF ONLINE RESEARCH METHODS

Volunteer (opt-in) panels

Volunteer (opt-in) panels are similar inconcept to the pre-recruited panels, exceptthe volunteers are not recruited using aprobability-based method. Rather, partici-pants choose to participate, perhaps aftercoming across a solicitation on a website.In this regard, volunteer panels are similarto unrestricted, self-selected surveys exceptthat those who opt in do so to take a con-tinuing series of surveys. Harris Interactivemanages such a volunteer panel. Its web-site (www.harrispollonline.com/became.asp)states, ‘You may have become a memberof the Harris Poll Online in one of severalways:

• By registering directly with us through our website(http://www.harrispollonline.com); or

• By opting in to participate in the Harris Poll Onlineas a result of an offering made in conjunction withone of our many online partners.’

Often these panels are focused on marketresearch, soliciting consumer opinions aboutcommercial products, and participants some-times do it for monetary incentives. Forexample, the Harris website states,

We offer the opportunity to earn HIPoints forthe majority of our studies. On occasion a studywill be conducted that will not have HIPointsassociated with it, but this only occurs in exceptions.Once you’ve accumulated enough points you mayredeem them for your choice of a variety of greatrewards (www.harrispollonline.com/benefit.asp).

ISSUES AND CHALLENGES ININTERNET-BASED SURVEY SAMPLING

All survey modes have their strengths andweaknesses; Internet-based surveys are nodifferent in this regard. The various strengthsand weaknesses are more or less impor-tant, depending on the survey’s purpose.Drawing an appropriate sample that willprovide the data necessary to appropriatelyaddress the research objective is critical.Hence, in this section we focus on the

issues and challenges related to sampling forInternet-based surveys.

Sampling Frame and CoverageChallenges

A frequent impediment for conducting large-scale, Internet-based surveys is the lack of asampling frame. Simply put, no single registryor list of e-mail addresses exists and thus list-based sampling frames are generally availableonly for specific populations (governmentorganizations, corporations, etc).

Compounding this difficulty, and leavingaside the issue of population coverage tobe discussed shortly, it is impossible toemploy a frameless sampling strategy, sincefor all practical purposes one cannot assemblerandom e-mail addresses. Of course, it istheoretically possible to ‘construct’ e-mailaddresses by repeatedly randomly concate-nating letters, numbers, and symbols, butthe sheer variety of e-mail addresses meansmost of the constructed addresses will notwork. More importantly, the unstructurednature of the Internet means that even if onecould tolerate the multitude of undeliverablee-mail messages that would result, they wouldnot be useful as the basis for a probabilitysample.

In terms of coverage, it is widely recog-nized that Internet-based surveys using onlysamples of Internet users do not generalize tothe general public. While Internet penetrationinto households continues at a rapid pace, thepenetration is far from complete (comparedto, say, the telephone) and varies widely bycountry and region of the world.3 The pointis, if the target of inference is the generalpublic, considerable coverage error remainsfor any sample drawn strictly from Internetusers.

Now, even if there is minimal coverageerror for a particular Internet-based surveyeffort, when using only an Internet-basedsurvey mode the target population must alsobe sufficiently computer-literate and haveboth regular and easy access to the Internetto facilitate responding to the survey. Simplyput, just because an organization maintains a

Page 13: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 207 195–217

SAMPLING METHODS FOR WEB AND E-MAIL SURVEYS 207

list of e-mail addresses for everyone in theorganization it does not necessarily follow thatevery individual on the list has equal access.Lack of equal access could result in significantsurvey selection and nonresponse biases.

Mixed-mode surveys usinginternet-based and traditionalmedia

For some surveys it may be fiscally andoperationally possible to contact respondentsby some mode other than e-mail, such as mailor telephone. In these cases the survey targetpopulation can be broader than that for whichan e-mail sampling frame is available, up toand including the general population. But atpresent such a survey must also use multiplesurvey modes to allow respondents withoutInternet access the ability to participate.Mixed-mode surveys may also be useful foralleviating selection bias for populations withuneven or unequal Internet access, and thesequential use of survey modes can increaseresponse rates.

For example, Dillman (2007: 456)describes a study in which surveys thatwere fielded using one mode were thenfollowed up with an alternate mode threeweeks later. As shown in Table 11.3, in allcases the response rate increased after thefollow-up. Now, of course, some of thisincrease can be attributed simply to thefact that a follow-up effort was conducted.However, the magnitude of the increases alsosuggests that offering a different responsemode in the follow-up could be beneficial.

However, mixed-mode surveys are subjectto other issues. Two of the most importantare mode effects and respondent mode pref-erences. Mode effects arise when the type of

survey affects how respondents answer ques-tions. Comparisons between Internet-basedsurveys and traditional surveys have foundconflicting results, with some researchersreporting mode effects and others not. See,for example, the discussion and resultsin Schonlau et al. (2004: 130). Thoughnot strictly a sampling issue, the point isthat researchers should be prepared for theexistence of mode effects in a mixed-modesurvey. Vehovar and Manfreda’s overviewchapter (this volume) explores in greater detailthe issues of combining data from Internet-based and traditional surveys.

In addition, when Internet-based surveysare part of a mixed-mode approach, it isimportant to be aware that the literaturecurrently seems to show that respondentswill tend to favour the traditional surveymode over an Internet-based mode. See, forexample, the discussions in Schonlau et al.(2002) and Couper (2000: 486–487). Frickerand Schonlau (2002), in a study of theliterature on web-based surveys, found ‘thatfor most of the studies respondents currentlytend to choose mail when given a choicebetween web and mail. In fact, even whenrespondents are contacted electronically it isnot axiomatic that they will prefer to respondelectronically’.

The tendency to favour non-Internet-basedsurvey modes lead Schonlau et al. (2002: 75)to recommend for mixed-mode mail and websurveys that:

… the most effective use of the Web at themoment seems to involve a sequential fieldingscheme in which respondents are first encouragedto complete the survey via the Web and thennonrespondents are subsequently sent a papersurvey in the mail. This approach has the advantageof maximizing the potential for cost savings from

Table 11.3 As reported in Dillman (2007), using an alternate survey mode as afollow-up to an initial survey mode can result in higher overall response rates

Initial survey mode andresponse rate

Follow-up survey mode and combinedresponse rate

Response rate increase

Mail (75%) Telephone (83%) 8%Telephone (43%) Mail (80%) 37%IVR5 (28%) Telephone (50%) 22%Web (13%) Telephone (48%) 35%

Page 14: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 208 195–217

208 THE SAGE HANDBOOK OF ONLINE RESEARCH METHODS

using Internet while maintaining the populationcoverage and response rates of a mail survey.

Web-based recruitment issuesand effects

Whether e-mail addresses are constructed,assembled from third-party sources, or har-vested directly from the web, there is theissue of unsolicited survey e-mail as spam.For example, Sheehan (1999) conducteda survey with e-mail addresses harvestedfrom www.Four11.com and stated, ‘Severalindividuals receiving the solicitation e-mailcensured the researchers for sending out unso-licited e-mails, and accused the researchersof “spamming”. ’ They further recounted that‘One [ISP] system operator [who observed alarge number of e-mail messages originatingfrom a single address] then contacted hiscounterpart at our university.’

In addition, distributing an unsolicitedInternet-based survey is also not without itsperils. For example, Andrews et al. (2002)report on a study of ‘hard-to-involve Internetusers’ – those who lurk in, but do not par-ticipate publicly in, online discussion forums.In their study, an invitation to participatein a web survey was posted as a messageto 375 online community discussion boards.While they collected 1,188 valid responses(out of 77,582 discussion board members),they also ‘received unsolicited email offers,some of which were pornographic in contentor aggressive in tone’ and they had their webserver hacked twice, once with the infectionof a virus.

In spite of the challenges and possibleperils, it is possible to recruit surveyparticipants from the web. For example,Alvarez et al. (2002) conducted twoInternet-based recruitment efforts – oneusing banner advertisements on web pagesand another using a subscription check box.In brief, their results were as follows.

• In the first recruitment effort, Alvarez et al. ranfour ‘banner’ campaigns in 2000 with the intentionof recruiting survey participants using web-pagebanner advertisements. In the first campaign,

which is representative of the other three, ananimated banner advertisement resulted in morethan 3.5 million ‘impressions’ (the number of timesthe banner was displayed), which resulted in thebanner being clicked 10,652 times, or a rate of3 clicks per 1,000 displays. From these 10,652clicks, 599 survey participants were recruited.

• In the second recruitment effort, the authorsran a ‘subscription’ campaign in 2001 in whichthey arranged with a commercial organization tohave a check box added to subscription formson various websites. Essentially, Internet userswho were registering for some service were givenan opportunity to check a box on the service’ssubscription form indicating their willingness toparticipate in a survey. As part of this effort,the authors conducted two recruitment drives,each of which was intended to net 10,000subscriptions. Across the two campaigns, 6,789new survey participants were obtained from21,378 subscribers.

The good news from the Alvarez et al.(2002) study is that, even though the bannerapproach yielded fewer new survey partici-pants, both methods resulted in a significantnumber of potential survey respondents overa relatively short period of time 3,431new subjects over the course of six orseven weeks from the banner campaigns,and 6,789 new subjects over the course ofthree weeks from the subscription campaigns.Each banner subject cost about $7.29 torecruit, while the subscription subjects costonly $1.27 per subject. (Unfortunately, theauthors did not present any data on surveycompletion rates, so we do not know whetherthere were differences between the twosamples that might have favored one over theother).

The bad news is that the two groupsdiffered significantly in all of the demo-graphic categories collected (gender, age,race, and education) and they differed inhow they answered questions on exactlythe same survey. In addition, both groupsdiffered significantly from the demographicsof the Internet population as measured bythe August 2000 Current Population Survey.The problem, of course, is that there areclear effects associated with how subjects

Page 15: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 209 195–217

SAMPLING METHODS FOR WEB AND E-MAIL SURVEYS 209

are recruited, such that the resulting samplesare different even from the general Internetpopulation. Shillewaert et al. (1998) foundsimilar recruitment method biases. Hence,while it is possible to ethically recruitsurvey participants from the web, it seemsthat the recruitment methodology affects thetypes of individual that self-select into thesample.

Improving response rates forInternet-based surveys

Response rates have a direct effect onsampling: the higher the response rate, thefewer people need to be sampled to achievea desired number of survey completions. Inaddition, higher response rates are associatedwith lower nonresponse bias.

Unfortunately, in a summary of the aca-demic survey-related literature up through2001, Fricker and Schonlau (2002) concludedthat ‘Web-only research surveys have cur-rently only achieved fairly modest responserates, at least as documented in the literature.’S. Fricker et al. (2005) similarly summarizedthe state of affairs as ‘Web surveys generallyreport fairly low response rates.’

A good illustration of this is the Couperet al. (1999) study in which employeesof five US federal government statisticalagencies were randomly given a mail or e-mailsurvey. Comparable procedures were usedfor both modes, yet higher response rateswere obtained for mail (68–76 percent) thanfor e-mail (37–63 percent) across all of theagencies.

Incentives are a common and effectivemeans for increasing response rates in tradi-tional surveys. Goritz (2006) is an excellentreview of the use of incentives in surveyresearch in which he distinguishes their usein traditional surveys from Internet-basedsurveys and provides a nice discussion ofthe issues associated with using incentives inInternet-based surveys. Open issues include:

• how best to deliver an incentive electronically;• whether it is better to provide the incentive prior

to a respondent taking the survey or after;

• whether incentives have different effects forindividuals taking a survey one time versus pre-recruited panel members who take a series ofsurveys.

Individual studies of Internet-based surveyshave generally found incentives to have littleor no effect. For example, Coomly (2000)found that incentives had little effect onresponse rates for pop-up surveys, and Kypriand Gallagher (2003) found no effect in aweb-based survey. However, Göritz (2006)conducted a meta-analysis of 32 experimentsevaluating the impact of incentives on survey‘response’ (the fraction of those solicited totake the survey that actually called up thefirst page of the survey), and 26 experimentsevaluating the effect of incentives on survey‘retention’ (the fraction of those who viewedthe first page that actually completed thesurvey). From the meta-analysis, Görtizconcluded that ‘material incentives promoteresponse and retention in Web surveys’where ‘material incentives increase the oddsof a person responding by 19% over theodds without incentives’ and ‘an incentiveincreased retention by 4.2% on average’.

In addition to incentives, Dillman (2007)and Dillman et al. (1999) have put forwarda number of survey procedural recommen-dations to increase survey response rates,based on equivalent methods for traditionalsurveys, which we will not re-cover here sincethey are mainly related to survey design andfielding procedures. While we do note thatthe recommendations seem sensible, Couper(2000) cautions that ‘there is at present littleexperimental literature on what works andwhat does not’.

Bigger samples are not alwaysbetter

With Internet-based surveys using a list-based sampling frame, rather than sending thesurvey out to a sample, researchers often sim-ply send the survey out to the entire samplingframe. That is, researchers naively conducting(all electronic) Internet-based surveys – wherethe marginal costs for additional surveys can

Page 16: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 210 195–217

210 THE SAGE HANDBOOK OF ONLINE RESEARCH METHODS

be virtually nil – often fail to recognizethe ‘trade-off between easy, low cost accessto large numbers of patients [participants]and the representativeness in the populationbeing studied’ (Soetikno et al., 1997). As wepreviously discussed, for both probability andnon-probability-based samples, larger samplesizes do not necessarily mean the sample ismore representative of any greater population:a sample can be biased whether it is large orsmall.

One might argue that in these situationsthe researchers are attempting to conduct acensus, but in practice they are forgoing aprobability sample in favour of a conveniencesample by allowing members of the samplingframe to opt into the survey. Dillman et al.(1999) summarized this practice as follows:‘…the ease of collecting hundreds, thousands,or even tens of thousands of responsesto web questionnaires at virtually no cost,except for constructing and posting, appearsto be encouraging a singular emphasis on thereduction of sampling error’. By this Dillmanet al. mean that researchers who focus only onreducing sampling error by trying to collectas large a sample as possible miss the pointthat it is equally important to reduce coverage,measurement, and nonresponse error in orderto be able to accurately generalize from thesample data.

A myopic focus on large sample sizes –and the idea that large samples equate tosample representativeness which equates togeneralizability – occurs with conveniencesample-based web and e-mail surveys aswell. ‘Survey2000’ is an excellent exampleof this type of focus. A large-scale, unre-stricted, self-selected survey, conducted asa collaborative effort between the NationalGeographic Society (NGS) and some aca-demic researchers, Survey2000 was fieldedin 1998. The survey was posted on theNational Geographic Society’s website andparticipants were solicited both with a link onthe NGS homepage and via advertisementsin NGS periodicals, other magazines, andnewspapers.

Upon completion of the effort, Witte et al.(2000) report that more than 80,000 surveys

were initiated and slightly more than 50,000were completed. While this is an impressivelylarge number of survey completions, theunrestricted, self-selected sampling strategyclearly results in a convenience sample thatis not generalizable to any larger population.Yet, Witte et al. (2000) go to extraordinarylengths to rationalize that their results aresomehow generalizable, while simultane-ously demonstrating that the results of thesurvey generally do not correspond to knownpopulation quantities.

Misrepresenting conveniencesamples

A related and significant concern with non-probability-based sampling methods, both forInternet-based and traditional surveys, is thatsurvey accuracy is characterized only in termsof sampling error and without regard tothe potential biases that may be present inthe results. While this has always been aconcern with all types of survey, the easeand spread of Internet-based surveys seems tohave exacerbated the practice. For example,the results of an ‘E-Poll’ were explained asfollows:

THE OTHER HALF / E-Poll® Survey of 1,007respondents was conducted January 16–20, 2003.A representative group of adults 18+ were ran-domly selected from the E-Poll online panel. At a95% confidence level, a sample error of +/− 3%is assumed for statistics based on the total sample of1,007 respondents. Statistics based on sub-samplesof the respondents are more sensitive to samplingerror. (From a press release posted on the E-Pollwebsite.)

No mention was made in the press releasethat the ‘E-Poll online panel’ consists ofindividuals who had chosen to participate inonline polls, nor that they were unlikely tobe representative of the general population.Rather, it leaves readers with an incorrectimpression that the results apply to the generalpopulation when, in fact, the margin of errorfor this particular survey is valid only foradult members of that particular E-Poll onlinepanel.

Page 17: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 211 195–217

SAMPLING METHODS FOR WEB AND E-MAIL SURVEYS 211

In response to the proliferation of thesetypes of misleading statement, the AmericanAssociation for Public Opinion Research(AAPOR) has publicly stated that ‘The report-ing of a margin of sampling error associatedwith an opt-in or self-identified sample (thatis, in a survey or poll where respondents areself-selecting) is misleading.’ They go on tosay, ‘AAPOR considers it harmful to includestatements about the theoretical calculation ofsampling error in descriptions of such studies,especially when those statements mislead thereader into thinking that the survey is basedon a probability sample of the full targetpopulation’ (AAPOR, 2007).

IN SUMMARY

A useful way to summarize which samplingmethods apply to which types of Internet-based survey is to group them by respondent‘contact mode’. That is, every survey effortcan be classified according to how therespondents are contacted (the contact mode),how they are asked to complete the survey(the response mode), and then how subsequentcommunication is conducted (the follow-up mode). Each of these can be executedin different media, where the media aretelephone, mail, web, e-mail, and so forth.For example, respondents may be contactedby telephone to participate in a web surveywith follow-up done by mail.

Explicitly specifying contact, response, andfollow-up modes is often irrelevant for tra-ditional surveys, since respondents that havebeen asked to take, say, a telephone surveyhave generally been contacted via the samemode. While not a strict rule – for example, atelephone survey may be preceded by mailedinvitation to each survey respondent – itis often the case. In comparison, given thechallenges that we have discussed in thischapter, the contact, response, and follow-upmodes are much more likely to differ withInternet-based surveys.

In terms of sampling for Internet-basedand e-mail surveys, what is relevant isthat the sampling methodology is generally

driven by the contact mode, not the responsemode. Hence, as shown in Table 11.4, wecan organize sampling strategies by contactmode, where the check marks indicate whichsampling strategies are mainly associated withthe various contact methods.

Note that we are focusing explicitly onan Internet-based survey response mode inTable 11.4. So, for example, while systematicsampling can be applied to phone or mailsurveys, the telephone is not likely to beused as a contact medium for an Internet-based survey using systematic sampling, andhence those cells in the table are not checked.Similarly, while there is a plethora of phone-in entertainment polls, neither the telephonenor postal mail is used to contact respon-dents to take Internet-based entertainmentpolls.

From Table 11.4 we can broadly summarizethe current state of the art for the variousInternet-based survey methods and theirlimitations as follows.

• Entirely web-based surveys, meaning surveys inwhich the potential respondents are contactedon the web and take a web survey, are chieflylimited to collecting data from non-probability-based samples.◦ The exception is systematic sampling for pop-

up/intercept surveys that are predominantlyused for customer-satisfaction types of surveyassociated with specific websites or webpages.

◦ Respondent contact for Internet-based surveysusing non-probability samples can also beconducted via traditional (non-Internet-based)media and advertising.

• Research surveys that require probability samplingare very limited when using an Internet-basedcontact mode (web and e-mail).◦ E-mail is useful as a contact mode only if a

list of e-mail addresses is available. Such a listis an actual or de facto sampling frame, fromwhich a sample may be drawn or a censusattempted.

◦ The population of inference is usually quitelimited when using an e-mail address samplingframe. It is generally the sampling frame itself.

◦ A poorly conducted census of an entire e-maillist may limit the survey results even further,since nonresponse and other biases may

Page 18: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 212 195–217

212 THE SAGE HANDBOOK OF ONLINE RESEARCH METHODS

Table 11.4 Sampling strategies for Internet-based surveys by contact mode

Sampling Strategy

Probability-based Non-probability-based

List

-bas

edsa

mpl

ing

fram

es

Non

-list

-bas

edra

ndom

sam

plin

g

Syst

emat

icsa

mpl

ing

Mix

ed-m

ode

surv

eyw

ithIn

tern

et-b

ased

optio

n

Pre-

recr

uite

dsu

rvey

pane

l

Ente

rtai

nmen

tpol

ls

Unr

estr

icte

dse

lf-se

lect

edsu

rvey

s

Harv

este

de-

mai

llis

ts

Volu

ntee

r(op

t-in

)pan

els

Inte

rnet

-bas

ed Web ✓ ✓ ✓ ✓

E-mail ✓ ✓

Cont

act

Met

hod

Non

-Inte

rnet

-bas

ed

Telephone ✓ ✓ ✓ ✓

Postal Mail ✓ ✓

In-person ✓

Other: TV, printadvertising, etc.

✓ ✓ ✓

preclude generalizing even to the sampleframe.

• If the research objectives require inferring from thesurvey results to some general population, thenrespondents will most likely have to be contactedby a non-Internet-based medium.◦ If the population of inference is a population

in which some of the members do not havee-mail/web access, then the contact mode willhave to be a non-Internet-based medium.

◦ Under such conditions, the survey will haveto be conducted using a mixed mode, so thatthose without Internet access can participate.Conversely, lack of a non-Internet-basedsurvey mode will result in coverage error withthe likely consequence of systematic bias.

◦ Pre-recruited panels can provide ready accessto pools of Internet-based survey respondents,but to allow generalization to some larger,

general population such panels need to berecruited using probability sampling methodsfrom the general population (usually via RDD).And, even under such conditions, researchersneed to carefully consider whether the panelis likely to be subject to other types of bias.

Looking to the future

So, what does the future hold for Internet-based survey sampling? At this point inthe Internet’s development, with its rapidexpansion and continued evolution, it’s trulyimpossible to say. Yet we can hazard a fewguesses.

First, if the Internet continues to expandbut largely maintains its current structure,then advances in sampling methods that will

Page 19: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 213 195–217

SAMPLING METHODS FOR WEB AND E-MAIL SURVEYS 213

allow random sampling of, and inferenceto, general populations will be at best slowand difficult to develop. This follows fromthe fact that Internet-wide sampling framesare simply unavailable under the currentInternet structure/organization, and no generalframeless sampling strategies yet exist. Unlessthe way the Internet is organized and operatedchanges, it seems this will continue to be thecase into the foreseeable future.

That said, survey sampling methodologistsshould endeavour to develop new samplingparadigms for Internet-based surveys. Thefundamental requirement for a probability-based sampling scheme is that every memberof the target population has a known, non-zeroprobability of being sampled. While in tradi-tional surveys this can be achieved via variousframe and frameless sampling strategies, itdoes not necessarily follow that Internetsurveys must use those same samplingstrategies. Rather, new sampling methods thattake advantage of the unique characteristics ofthe Internet, such as the near-zero marginalcost for contacting potential respondents,should be explored and developed.

In addition, researchers considering con-ducting an Internet-based survey should con-sider whether the capabilities of the web canbe leveraged to collect the desired data in someother innovative fashion. For example, Lock-ett and Blackman (2004) present a case studyof Xenon Laboratories, an Internet-basedfinancial services firm that employed a novelapproach to market research. Xenon Labo-ratories wanted to collect data on foreign-exchange charges by credit-card companieson business travellers. They recognizedthat neither the travellers nor the credit-card companies were likely to respond toa survey on this topic, whether fieldedover the web or otherwise. Instead XenonLaboratories developed the Travel ExpensesCalculator (www.xe.com/tec) and the CreditCard Charges Calculator (www.xe.com/ccc)and posted them on the web for anyone touse for free. These tools help foreign businesstravellers to accurately calculate the cost of abusiness expense receipt in terms of their owncurrency.

Lockett and Blackman (2004) say, ‘Onthe basis of this information [input by thoseusing the calculators] it is possible to conductbasic market research by aggregating theinputted calculations. Xenon is now in theunique position to analyse whether or notthe different card providers employ the samecharging levels and whether or not thesecompanies’ charge structures vary accordingto geographical region.’ They go on toconclude, ‘This value-added approach, whichis mutually beneficial to both parties, isan important and novel approach to marketresearch.’

Second, it is also possible that technologicalinnovation will facilitate other means ofsampling for Internet-based surveys (and forconducting the surveys themselves, for thatmatter). For example, current trends seem topoint towards a merging of the Internet withtraditional technologies such as television andtelephone. It is quite possible that sometime inthe future all of these services will merge intoone common household device through whicha consumer could simultaneously watchtelevision, surf the web, send e-mail, andplace telephone calls. Depending on how thisevolves, various types of random samplingmethodologies, as well as new survey modes,may become possible. For example, it maybecome feasible, and perhaps even desirable,to sample respondents via RDD (or somethingvery similar), have an interviewer call therespondent, and then in real time presentthe respondent with an interviewer-assisted,web-based survey.

It is also possible that, as the technologymatures, these combined television–Internet–telephone appliances will become asubiquitous as televisions are today, withservice offered only by a few large companies.If so, then it may ultimately be possible to usethe companies’subscriber listings as samplingframes (much as telephone directories wereused pre-RDD in the mid-1900s). Or it maybe that some other state emerges that lendsitself to some form of sampling that is notpossible today. The point is that the Internetis still very much in its infancy, and thecurrent difficulties surrounding sampling

Page 20: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 214 195–217

214 THE SAGE HANDBOOK OF ONLINE RESEARCH METHODS

for Internet-based survey described in thischapter may or may not continue into thefuture.

To put this in a historical context, note thatwhile the telephone was invented in the late1800s, and telephone systems developed andexpanded rapidly through the early 1900s, itwas not until the mid-1900s that telephonecoverage was sufficiently large and standardsfor telephone numbers adopted that madeRDD possible. In fact, the foundational ideasfor an efficient RDD sampling methodologywere not proposed until the early 1960s(Cooper, 1964), after which it took roughlyanother decade of discussion and develop-ment before RDD as we know it today becamecommonplace.4 Thus, in total, it was roughlya century after the invention of the telephonebefore RDD became an accepted samplingmethodology.

In comparison, the web has been inexistence, in a commercial sense, only forlittle more than a decade or two. As with thetelephone in the late 1800s and early 1900s,we are in a period of technological innovationand expansion with the Internet. However,unlike the telephone, given today’s pace ofinnovation, the Internet and how we use itis likely to be quite different even just a fewyears from now. How this affects sampling forInternet-based surveys remains to be seen.

NOTES

1 Readers interested in the mathematics shouldconsult one of the classic texts such as Kish (1965)or Cochran (1977); readers interested in a summarytreatment of the mathematics and/or a more detaileddiscussion of the sampling process may consult anumber of other texts, such as Fink (2003) or Fowler(2002). For those specifically interested in samplingmethods for qualitative research, see Patton (2002).

2 These were not the only errors made in the 1936and 1948 US presidential election polls (for moredetail, see Zetterberg (2004) or Ross (1977)), but theywere significant errors, and are highlighted here toillustrate that the various biases that can creep intosurvey and data collection can be both subtle andnon-trivial.

3 For example, as of January 2007 Internet WorldStats (2007) reported that 35 countries had greaterthan 50 percent Internet penetration, ranging from a

high of 86.3 percent for Iceland to 50 percent for theCzech Republic. In comparison, Internet penetrationfor the rest of the world was estimated to be8.7 percent.

4 See, for example, ‘Random Digit Dialing as aMethod of Telephone Sampling (Glasser and Metzger,1972), ‘An Empirical Assessment of Two TelephoneSampling Designs (Groves, 1978), and ‘Random DigitDialing: A Sampling Technique for Telephone Surveys’(Cummings, 1979).

5 IVR stands for Interactive Voice Response.These are automated telephone surveys in whichprerecorded questions are used and respondents’answers are collected using voice-recognitiontechnology.

REFERENCES

Alvarez, R.M., Sherman, R.P. and VanBeselaere, C.(2002) ‘Subject acquisition for web-based surveys’,dated September 12, 2002, accessed on-line atsurvey.caltech.edu/alvarez.pdf on September 29,2006.

American Association for Public Opinion Research(AAPOR) (2007) ‘Reporting of margin of error orsampling error in online and other surveys ofself-selected individuals’. Accessed online at www.aapor.org/pdfs/2006/samp_err_stmt.pdf on April 23,2007.

Andrews, D., Nonnecke, B. and Preece, J. (2002)‘Electronic survey methodology: a case study inreaching hard-to-involve Internet users’, InternationalJournal of Human-Computer Interaction, 12 (2):185–210.

Berson, I.R., Berson, M.J. and Ferron, J.M. (2002)‘Emerging risks of violence in the digital age: lessonsfor educators from an online study of adolescent girlsin the United States’, Journal of School Violence, 1(2),51–72. Published simultaneously online in Meridian,A Middle School Computer Technologies Journal,accessed at www2.ncsu.edu/unity/lockers/project/meridian/sum2002/cyberviolence/cyberviolence.pdfon September 29, 2006.

Cochran, William G. (1977) Sampling Techniques.New York: John Wiley.

Comley, P. (2000) ‘Pop-up surveys: what works, whatdoesn’t work and what will work in the future.’Proceedings of the ESOMAR worldwide Internetconference Net Effects 3, volume 237. Amsterdam,NL: ESOMAR. Accessed at www.virtualsurveys.com/news/papers/paper_4.asp on September 21, 2006.

Coomber, R. (1997) Using the Internet for Sur-vey Research, Sociological Research Online, 2,14–23.

Page 21: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 215 195–217

SAMPLING METHODS FOR WEB AND E-MAIL SURVEYS 215

Cooper, S.L. (1964) ‘Random sampling by telephone: animproved method’, Journal of Marketing Research,1 (4): 45–8.

Couper, Mick P. (2000) ‘Review: web surveys: a review ofissues and approaches’, The Public Opinion Quarterly,64 (4): 464–94.

Couper, Mick P., Blair, J. and Triplett, T. (1999)‘A comparison of mail and e-mail for a survey ofemployees in federal statistical agencies’, Journal ofOfficial Statistics, 15 (1): 39–56.

Cummings, K.M. (1979) ‘Random Digit Dialing:a sampling technique for telephone surveys’, PublicOpinion Quarterly, 43, 233–44.

Dillman, D.A. (2007) Mail and Internet Surveys: TheTailored Design Method (2007 update with newInternet, visual, and mixed-mode guide), 2nd edn.New York: John Wiley.

Dillman, D.A., Tortora, R.D. and Bowker, D.(1999) ‘Principles for constructing web surveys’,accessed online at www.sesrc.wsu.edu/dillman/papers/websurveyppr.pdf on January 27, 2005.

Fink, Arlene (2003) How to Sample in Surveys, 2nd edn:The Survey Kit, volume 7. Thousand Oaks: Sage.

Fowler, Jr. and Floyd J. (2002) Survey Research Methods,3rd edn: Applied Social Research Methods Series,volume 1. Thousand Oaks: Sage.

Fricker, Jr., R.D. and Schonlau, M. (2002) ‘Advantagesand disadvantages of Internet research surveys:evidence from the literature’, Field Methods, 14:347–67.

Fricker, S., Galesic, M., Tourangeau, R. and Yan, T.(2005) ‘An experimental comparison of web andtelephone surveys’, Public Opinion Quarterly, 69 (3):370–92.

Glasser, G.J. and Metzger, G.D. (1972) ‘Random-DigitDialing as a method of telephone sampling’, Journalof Marketing Research, 9 (1), 59–64.

Göritz, A.S. (2006) ‘Incentives in web studies: method-ological issues and a review’, International Journal ofInternet Science, 1 (1): 58–70.

Groves, R.M. (1978) ‘An empirical comparison of twotelephone sample designs’, Journal of MarketingResearch, 15 (4), 622–31.

Groves, R.M. (1989) Survey Errors and Survey Costs.New York: John Wiley.

Internet World Stats (2005) Accessed online atwww.internetworldstats.com/top25.htm on April 23,2007.

Jackob, N., Arens, J. and Zerback, T. (2005)‘Sampling procedure, questionnaire design, onlineimplementation, and survey response in a multi-national online journalist survey’. Paper presentedat the Joint WAPOR/ISSC Conference: ConductingInternational Social Surveys. Accessed online at ciss.

ris.org/uploadi/editor/1132070316WAPORPaper.pdfon September 30, 2006.

Kish, Leslie (1965) Survey Sampling. New York:Wiley-Interscience. (New edition February 1995).

Krishnamurthy, S. (2002) ‘The ethics of conductinge-mail surveys’. Readings in virtual research ethics:issues and controversies, accessed online at ssrn.com/abstract_id=651841 on September 30, 2006.

Kypri, K. and Gallagher, S.J. (2003) ‘Incentives toincrease participation in an Internet survey ofalcohol use: a controlled experiment’, Alcohol andAlcoholism, 38 (5): 437–441.

Lockett, A. and Blackman, I. (2004) ‘Conducting marketresearch using the Internet: the case of XenonLaboratories’, Journal of Business and IndustrialMarketing, 19 (3), 178–87.

Midwest Book Review (2003) ‘Book review: Takeadvantage of the Internet and preserve dataintegrity’, accessed online at http://www.amazon.com/Conducting-Research-Surveys-E-Mail-Web/dp/0833031104/sr=1-3/qid=1160191573/ref=sr_1_3/102-5971652-1424906?ie=UTF8&s=books onOctober 6, 2006.

Patton, M.Q. (2002) Qualitative Evaluation andResearch Methods, London: Sage.

Pineau, V. and Dennis, J.M. (2004) ‘Methodologyfor probability-based recruitment for a web-enabledpanel’, dated November 21, 2004. Downloadedfrom www.knowledgenetworks.com/ganp/reviewer-info.html on July 21, 2006.

Ross, I. (1977) The Loneliest Campaign: The TrumanVictory, (reprint edition). Greenwood Press. Accessedonline at www.trumanlibrary.org/whistlestop/study_collections/1948campaign/large/docs/documents/index.php?documentdate=1968-00-00&documentid=53&studycollectionid=Election&pagenumber=1 onOctober 1, 2006.

Schonlau, Matthias, Fricker, Jr. Ronald D. and Elliott,Marc N. (2002) Conducting Research Surveys viaE-Mail and the Web, MR-1480-RC, Santa Monica:RAND.

Schonlau, Matthias, Zapert, K., Simon, L.P.,Sanstad, K.H., Marcus, S.M., Adams, J., Spranca, M.,Kan, H., Turner, R. and Berry, S.H.. (2004)‘A comparison between responses from a propensity-weighted web survey and an identical RDD survey’,Social Science Computer Review, 22 (1): 128–38.

Sheehan, K.B. (1999) ‘Using e-mail to survey Internetusers in the United States: methodology andassessment’, Journal of Computer Mediated Commu-nication, (4) 3. Accessed online at http://jcmc.indiana.edu/vol14/issue3/sheehan.html on July 6, 2006.

Shillewaert, N., Langerak, F. and Duhamel, T.(1998) ‘Non-probability sampling for WWW

Page 22: Sampling Methods for Web and E-mail Surveys

[17:36 4/3/2008 5123-Fielding-Ch11.tex] Paper: a4 Job No: 5123 Fielding: Online Research Methods (Handbook) Page: 216 195–217

216 THE SAGE HANDBOOK OF ONLINE RESEARCH METHODS

surveys: a comparison of methods’, Journal of theMarket Research Society, 40 (4): 307–22.

Siah, C.Y. (2005) ‘All that glitters is not gold:examining the perils and obstacles in collectingdata on the Internet’, International Negotiation, 10:115–30.

Soetikno, R.M., Provenzale, D. and Lenert, L.A. (1997)‘Studying ulcerative colitis over the World WideWeb’, American Journal of Gastroenterology, 92 (3):457–60.

Squires, P. (1988) ‘Why the 1936 “Literary Digest” pollfailed’, The Public Opinion Quarterly, 53 (1): 125–33.

Witte, J.C., Amoroso, L.M. and Howard, P.E.N. (2000)‘Research methodology: method and representationin Internet-based survey tools – mobility, community,and cultural identity in Survey2000’, Social SciencesComputer Review, 18 (2): 179–95.

Wright, T. and Tsao, H.J. (1983) ‘A frame on frames:an annotated bibliography’. In T. Wright (ed.)Statistical Methods and the Improvement of DataQuality. New York: Academic Press.

Zetterberg, H.L. (2004) ‘US election 1948: the first greatcontroversy about polls, media, and social science’.Paper presented at the WAPOR regional conferenceon ‘Elections, News Media and Public Opinion’ inPamplona, Spain, November 24–26. Accessed onlineat http://www.zetterberg.org/Lectures/l041115.htmon October 1, 2006.

FURTHER READING

Mail and Internet Surveys: The Tailored Design Methodby Dillman (2007 edition) and Survey Errors and SurveyCosts by Groves (1989). Each of these texts focuses onthe entire process of designing and fielding surveys, notjust sampling.

Conducting Research Surveys via E-Mail and the Webby Schonlau, Fricker and Elliott (2002) ‘is a practical andaccessible guide to applying the pervasiveness of theInternet to the gathering of survey data in a much fasterand significantly less expensive manner than traditionalmeans of phone or mail communications.’ (MidwestBook Review, 2003)

‘Review: Web Surveys: A Review of Issues andApproaches’ by Mick P. Couper, published in The PublicOpinion Quarterly, is an excellent and highly cited articlethat emphasizes many of the points and ideas discussedin this chapter. It also provides additional examples tothose presented in this chapter.

Sampling Techniques by Cochran (1977) is one of theclassic texts on the mathematical details of surveysampling, covering a wide range of sampling methodsapplicable to all types of survey effort.