54
Sampling ESP 178 Applied Research Methods Calvin Thigpen 2/2/17 Adapted from lecture by Professor Susan Handy

Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Embed Size (px)

Citation preview

Page 1: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

SamplingESP 178 Applied Research Methods

Calvin Thigpen2/2/17

Adapted from lecture by Professor Susan Handy

Page 2: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Ethics

Page 3: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Hypothetical example

Page 5: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

To Test Housing Program, Some Are Denied AidNew York Times, 12/8/2010

“Half of the test subjects — people who are behind on rent and in danger of being evicted —are being denied assistance from the program for two years, with researchers tracking them to see if they end up homeless.”

http://www.nytimes.com/2010/12/09/nyregion/09placebo.html?pagewanted=all&_r=0

Page 6: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

“Moving to Opportunity for Fair Housing (MTO) is a 10-year research demonstration that combines tenant-based rental assistance with housing counseling to help very low-income families move from poverty-stricken urban areas to low-poverty neighborhoods.”

Moving to Opportunity

http://portal.hud.gov/hudportal/HUD?src=/programdescription/mto

Treatment group: Randomly selected households with children receive housing counseling and vouchers that must be used in areas with less than 10 percent poverty

Control groups: One already receiving vouchers, one just coming into voucher program

Page 7: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Ethical principles

• Minimize harm• No deception• Informed consent• Identity protection• Distribute benefits equitably

Page 8: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

What we’ve covered…

Aspect of Research Type of ValidityWhat to study Conceptualization and

operationalizationMeasurement

Who to study Sampling External (Generalizability)

How to study it Research design Internal (Causal)

Page 9: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Sampling

Page 10: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Sampling is the (statistical) process of selecting a subset of a population of interest for purposes of making observations and (statistical) inferences about that population.

Page 11: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

How it worksUnit of Observation

PopulationWho we want to know about or generalize to People living in selected area

Sampling FrameA list of units within population from which you draw your sample

SampleWho we collect data from

Page 12: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Types of Sampling

Type DefinitionProbability sampling i.e. random

Every element in the population has a non-zero probability of being selected; sampling involves random selection (equal chance)

Non-probability sampling i.e. non-random

Do not know in advance how likely that any element of the population will be selected for the sample; non-random selection (not equal chance)

Page 13: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Probability samplingGoal is a representative sample –

one that resembles the population of interest

Page 14: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Population

Sample

Generalizability

A Different Population

Sample Generalizability Cross-Population Generalizability

Page 15: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Sampling Error

Difference between characteristics of sample and characteristics of population

Random sampling error

Inherent in process of sampling! Measure it with confidence intervals (see below).

Systematic samplingerror

Depends on how good your sampling method is!Use good methods to minimize this.

Page 16: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

The Sampling Frames Challenge

Issues?

Alternatives? ?

Page 17: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chance

Page 18: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chanceSystematic Random Sampling

Select first element randomly, then pick every nth element

To select first element, then every 10th element

OR to select element on each page

Page 19: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chanceSystematic Random Sampling

Select first element randomly, then pick every nth element

Stratified Random Sampling – Proportionate

Sort population into strata; randomly select within strata; proportionate to population

If sampling frames available for strata but not overall population.To have more homogeneous samples (see below).

Page 20: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chanceSystematic Random Sampling

Select first element randomly, then pick every nth element

Stratified Random Sampling – Proportionate

Sort population into strata; randomly select within strata; proportionate to population

Stratified Random Sampling – Disproportionate

Sort population into strata; randomly select within strata; disproportionate to population

To ensure enough elements in small strata.Strata defined by some characteristic…

Page 21: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Campus Travel Survey Sampling

Page 22: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

http://its.ucdavis.edu/research/publications/publication-detail/?pub_id=2537

Campus Mode Share

Page 23: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chanceSystematic Random Sampling

Select first element randomly, then pick every nth element

Stratified Random Sampling – Proportionate

Sort population into strata; randomly select within strata; proportionate to population

Stratified Random Sampling – Disproportionate

Sort population into strata; randomly select within strata; disproportionate to population

Cluster Sampling Draw random sample of clusters, then select elements within clusters

Cluster = naturally occurring grouping, e.g. neighborhoods, classes, etc. Often used for practical purposes – if in-person data collection.

Page 24: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not
Page 25: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Exit Polling

Page 26: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Difference between Stratified-Random and Cluster Sampling

Divide population into groups

Draw random sample of groups

Stratified Random Cluster

Randomly sample units within groups

Page 27: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chanceSystematic Random Sampling

Select first element randomly, then pick every nth element

Stratified Random Sampling – Proportionate

Sort population into strata; randomly select within strata; proportionate to population

Stratified Random Sampling – Disproportionate

Sort population into strata; randomly select within strata; disproportionate to population

Cluster Sampling Draw random sample of clusters, then select elements within clusters

Matched-pairs Sampling Divide population into two groups by key characteristic; draw random sample in Group 1, then find matches in Group 2

Page 28: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

How do you know if you have a representative sample?

Page 29: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Source: Thigpen and Volker, 2017

Compare to census data…

Page 30: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Does sample size matter?

Page 31: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Inferential Statistics for Probability SamplesTerm Definitionn Sample sizeSample statistic Statistic computed from sample, e.g. meanPopulation parameter True value of statistic, e.g. mean, for populationSampling error Population parameter – sampling statistic

We don’t know the population parameter!So we don’t know the sampling error!

But we can estimate a confidence interval…

Page 32: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Sample mean

Sample StatisticStandard Deviation = how close individual scores are to the sample mean

Scores

Page 33: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Population mean

Let’s say we take a bunch of samples…Standard error = how close mean scores from repeated samples are to the population mean

Sample means

Page 34: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not
Page 35: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Calculating a confidence interval

s𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑡𝑡𝑡𝑡𝑒𝑒𝑡𝑡 = 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑒𝑒𝑑𝑑𝑑𝑑𝑡𝑡𝑡𝑡𝑑𝑑𝑒𝑒𝑡𝑡/ 𝑡𝑡

confidence interval = 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑑𝑑𝑠𝑠𝑡𝑡𝑑𝑑𝑠𝑠 ± 2 𝑥𝑥 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑡𝑡𝑡𝑡𝑒𝑒𝑡𝑡

Page 36: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

The 68-95-99 percent rule for confidence intervals

Page 37: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Calculating a confidence interval

s𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑡𝑡𝑡𝑡𝑒𝑒𝑡𝑡 = 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑒𝑒𝑑𝑑𝑑𝑑𝑡𝑡𝑡𝑡𝑑𝑑𝑒𝑒𝑡𝑡/ 𝑡𝑡

How do we reduce standard error?

More homogeneous populations mean tighter confidence intervals: SD ↓ → SE ↓ → CI ↓

Larger sample sizes mean tighter confidence intervals: n ↑ → SE ↓ → CI ↓

95% confidence interval =𝑚𝑚𝑒𝑒𝑡𝑡𝑡𝑡 ± 1.96 𝑥𝑥 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑡𝑡𝑡𝑡𝑒𝑒𝑡𝑡

Page 38: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Example

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

1 2 3 4 5 6 7 8 9 10 11 12 13

Pints per Week

Num

ber o

f Stu

dent

s

UCDCSUC

Page 39: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Calculation of Confidence Intervals UCD CSUC

Mean 4.1 6.0

Standard Deviation 2.54 3.32

Page 40: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Calculation of Confidence Intervals UCD CSUC

Mean 4.1 6.0

Standard Deviation 2.54 3.32

Standard Error if n=20 1.26 1.36

95% CI low 1.57 3.34

95% CI high 6.53 8.66

= 𝑆𝑆𝑆𝑆/ 𝑡𝑡

= 𝑚𝑚𝑒𝑒𝑡𝑡𝑡𝑡 − 2 ∗ 𝑆𝑆𝑆𝑆= 𝑚𝑚𝑒𝑒𝑡𝑡𝑡𝑡 + 2 ∗ 𝑆𝑆𝑆𝑆

Are the means different?i.e. do the confidence intervals overlap?

Page 41: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Calculation of Confidence Intervals UCD CSUC

Mean 4.1 6.0

Standard Deviation 2.54 3.32

Standard Error if n=20 1.26 1.36

95% CI low 1.57 3.34

95% CI high 6.53 8.66

Standard Error if n=200 0.18 0.24

95% CI low 3.70 5.54

95% CI high 4.40 6.46

Are the means different?

Page 42: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Another way to think about sample size – Power Analysis• Relates to experimental design: effect size

Power = probability that the statistical analysis detects the effect

Page 43: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Sample size matters

Larger sample is better – up to a point

Population

Sample

Population

Sample

Page 44: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Percent women based on random sample…

Sample Size Margin of Error Range for Estimate

5 44% 6 – 94%

10 31% 19 – 81%

20 22% 28 – 72%

30 17% 33 – 67%

40 15% 35 – 65%

50 13% 37 – 63%

http://www.methodspace.com/group/qualitativeinterviewing/forum/topics/two-problems-with-random-sampling-and-two-alternatives

Page 45: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

A few examples

Page 46: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Prof. Handy’s Cul-de-Sac Sampling Plan:

The household survey will provide the primary data for testing the hypothesis. We expect the design of the sampling plan for the survey and the design of the survey instrument itself to be particularly challenging for this study. The target population for this study is children living in houses located on cul-de-sacs and through streets in the Sacramento region. Because no sampling frame exists for this population, we will use a multi-stage cluster sampling strategy. First, residential neighborhoods throughout the region will be defined based on major roadways and other geographic features. Census data will be used to eliminate neighborhoods built before 1950 because of the infrequent use of cul-de-sacs in residential developments before this time. From the remaining post-1950 neighborhoods, a random sample of neighborhoods will be selected. Within these neighborhoods, cul-de-sacs will be identified using the 2000 Census street network and the capabilities of geographic information systems (GIS). A random sample of the cul-de-sacs within each neighborhood will be chosen. For each cul-de-sac in the sample, a segment of a nearby through street (defined as a street that links arterial streets and carries significant levels of through traffic) of otherwise similar characteristics and similar length will be chosen. This approach creates matched pairs of streets. Next, all addresses on the sample street pairs will be compiled to create a sample of households. The sample of streets will be used in the field observations, the sample of households will be used in the household survey, and a sub-sample of the sample of households will be used in the in-depth interviews, as described below.

Page 47: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Handy, et al., 2005

Page 48: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Handy, et al., 2005

Page 49: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not
Page 50: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not
Page 51: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Other things to think about!Goal Example 1

Child as unit of analysisExample 2Neighborhood as unit

Ensuring that the independent variable varies

Sample includes children who live on cul-de-sacs and children who don’t

Sample includes neighborhoods that have lots of cul-de-sacs and neighborhoods that have few cul-de-sacs

Ensuring that the control variables don’t vary (aka “elimination”)

Sample includes only children from moderate-income households

Sample includes only neighborhoods with average income in the moderate range

Page 52: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Types of Sampling

Type DefinitionProbability sampling i.e. random

Every element in the population has a none non-zero probability of being selected; sampling involves random selection (equal chance)

Non-probability sampling i.e. non-random

Do not know in advance how likely that any element of the population will be selected for the sample; non-random selection (not equal chance)

Page 53: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

Non-Probability SamplingMethod DefinitionAvailability or Convenience Sampling

Cases selected because they’re easy to find

Quota Sampling (proportional or not)

Groups defined by key characteristics; specified number of cases selected in each group.

Purposive Sampling Individuals selected for sample because they possess a unique trait

Expert Sampling Individuals selected for sample because of their knowledge – “key informants”

Snowball Sampling Start with initial sample, ask them to recommend other participants

Used for exploratory and qualitative research.Must be very cautious about generalizing!

Page 54: Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

To do

• Sampling exercise on Tuesday (2/7)• Read and study for midterm (2/14)

• Don’t forget lecture slides and lecture summary notes on website!

• Office Hours• Calvin: Wed 9-11 AM• Dillon: Fri 8-9 + 10-11 AM

• Paper critique in discussion section on Friday• READ Lubell et. al. 2009 before discussion tomorrow!