Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not

SamplingESP 178 Applied Research Methods

Calvin Thigpen2/2/17

Adapted from lecture by Professor Susan Handy

Ethics

Hypothetical example

Movie Trailer BBC story

https://www.youtube.com/watch?v=3XN2X72jrFk

https://www.youtube.com/watch?v=760lwYmpXbc

To Test Housing Program, Some Are Denied AidNew York Times, 12/8/2010

“Half of the test subjects — people who are behind on rent and in danger of being evicted —are being denied assistance from the program for two years, with researchers tracking them to see if they end up homeless.”

http://www.nytimes.com/2010/12/09/nyregion/09placebo.html?pagewanted=all&_r=0

“Moving to Opportunity for Fair Housing (MTO) is a 10-year research demonstration that combines tenant-based rental assistance with housing counseling to help very low-income families move from poverty-stricken urban areas to low-poverty neighborhoods.”

Moving to Opportunity

http://portal.hud.gov/hudportal/HUD?src=/programdescription/mto

Treatment group: Randomly selected households with children receive housing counseling and vouchers that must be used in areas with less than 10 percent poverty

Control groups: One already receiving vouchers, one just coming into voucher program

Ethical principles

• Minimize harm• No deception• Informed consent• Identity protection• Distribute benefits equitably

What we’ve covered…

Aspect of Research Type of ValidityWhat to study Conceptualization and

operationalizationMeasurement

Who to study Sampling External (Generalizability)

How to study it Research design Internal (Causal)

Sampling

Sampling is the (statistical) process of selecting a subset of a population of interest for purposes of making observations and (statistical) inferences about that population.

How it worksUnit of Observation

PopulationWho we want to know about or generalize to People living in selected area

Sampling FrameA list of units within population from which you draw your sample

SampleWho we collect data from

Types of Sampling

Type DefinitionProbability sampling i.e. random

Every element in the population has a non-zero probability of being selected; sampling involves random selection (equal chance)

Non-probability sampling i.e. non-random

Do not know in advance how likely that any element of the population will be selected for the sample; non-random selection (not equal chance)

Probability samplingGoal is a representative sample –

one that resembles the population of interest

Population

Sample

Generalizability

A Different Population

Sample Generalizability Cross-Population Generalizability

Sampling Error

Difference between characteristics of sample and characteristics of population

Random sampling error

Inherent in process of sampling! Measure it with confidence intervals (see below).

Systematic samplingerror

Depends on how good your sampling method is!Use good methods to minimize this.

The Sampling Frames Challenge

Issues?

Alternatives? ?

Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chance

Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chanceSystematic Random Sampling

Select first element randomly, then pick every nth element

To select first element, then every 10th element

OR to select element on each page



Stratified Random Sampling – Proportionate

Sort population into strata; randomly select within strata; proportionate to population

If sampling frames available for strata but not overall population.To have more homogeneous samples (see below).





Stratified Random Sampling – Disproportionate

Sort population into strata; randomly select within strata; disproportionate to population

To ensure enough elements in small strata.Strata defined by some characteristic…

Campus Travel Survey Sampling

http://its.ucdavis.edu/research/publications/publication-detail/?pub_id=2537

Campus Mode Share







Cluster Sampling Draw random sample of clusters, then select elements within clusters

Cluster = naturally occurring grouping, e.g. neighborhoods, classes, etc. Often used for practical purposes – if in-person data collection.

Exit Polling

Difference between Stratified-Random and Cluster Sampling

Divide population into groups

Draw random sample of groups

Stratified Random Cluster

Randomly sample units within groups







Cluster Sampling Draw random sample of clusters, then select elements within clusters

Matched-pairs Sampling Divide population into two groups by key characteristic; draw random sample in Group 1, then find matches in Group 2

How do you know if you have a representative sample?

Source: Thigpen and Volker, 2017

Compare to census data…

Does sample size matter?

Inferential Statistics for Probability SamplesTerm Definitionn Sample sizeSample statistic Statistic computed from sample, e.g. meanPopulation parameter True value of statistic, e.g. mean, for populationSampling error Population parameter – sampling statistic

We don’t know the population parameter!So we don’t know the sampling error!

But we can estimate a confidence interval…

Sample mean

Sample StatisticStandard Deviation = how close individual scores are to the sample mean

Scores

Population mean

Let’s say we take a bunch of samples…Standard error = how close mean scores from repeated samples are to the population mean

Sample means

Calculating a confidence interval

s𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑡𝑡𝑡𝑡𝑒𝑒𝑡𝑡 = 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑒𝑒𝑑𝑑𝑑𝑑𝑡𝑡𝑡𝑡𝑑𝑑𝑒𝑒𝑡𝑡/ 𝑡𝑡

confidence interval = 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑑𝑑𝑠𝑠𝑡𝑡𝑑𝑑𝑠𝑠 ± 2 𝑥𝑥 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑡𝑡𝑡𝑡𝑒𝑒𝑡𝑡

The 68-95-99 percent rule for confidence intervals

Calculating a confidence interval

s𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑡𝑡𝑡𝑡𝑒𝑒𝑡𝑡 = 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑒𝑒𝑑𝑑𝑑𝑑𝑡𝑡𝑡𝑡𝑑𝑑𝑒𝑒𝑡𝑡/ 𝑡𝑡

How do we reduce standard error?

More homogeneous populations mean tighter confidence intervals: SD ↓ → SE ↓ → CI ↓

Larger sample sizes mean tighter confidence intervals: n ↑ → SE ↓ → CI ↓

95% confidence interval =𝑚𝑚𝑒𝑒𝑡𝑡𝑡𝑡 ± 1.96 𝑥𝑥 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑡𝑡𝑡𝑡𝑒𝑒𝑡𝑡

Example

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

1 2 3 4 5 6 7 8 9 10 11 12 13

Pints per Week

Num

ber o

f Stu

dent

s

UCDCSUC

Calculation of Confidence Intervals UCD CSUC

Mean 4.1 6.0

Standard Deviation 2.54 3.32


Mean 4.1 6.0


Standard Error if n=20 1.26 1.36

95% CI low 1.57 3.34

95% CI high 6.53 8.66

= 𝑆𝑆𝑆𝑆/ 𝑡𝑡

= 𝑚𝑚𝑒𝑒𝑡𝑡𝑡𝑡 − 2 ∗ 𝑆𝑆𝑆𝑆= 𝑚𝑚𝑒𝑒𝑡𝑡𝑡𝑡 + 2 ∗ 𝑆𝑆𝑆𝑆

Are the means different?i.e. do the confidence intervals overlap?


Mean 4.1 6.0



95% CI low 1.57 3.34

95% CI high 6.53 8.66


95% CI low 3.70 5.54

95% CI high 4.40 6.46

Are the means different?

Another way to think about sample size – Power Analysis• Relates to experimental design: effect size

Power = probability that the statistical analysis detects the effect

Sample size matters

Larger sample is better – up to a point

Population

Sample

Population

Sample

Percent women based on random sample…

Sample Size Margin of Error Range for Estimate

5 44% 6 – 94%

10 31% 19 – 81%

20 22% 28 – 72%

30 17% 33 – 67%

40 15% 35 – 65%

50 13% 37 – 63%

http://www.methodspace.com/group/qualitativeinterviewing/forum/topics/two-problems-with-random-sampling-and-two-alternatives

A few examples

Prof. Handy’s Cul-de-Sac Sampling Plan:

The household survey will provide the primary data for testing the hypothesis. We expect the design of the sampling plan for the survey and the design of the survey instrument itself to be particularly challenging for this study. The target population for this study is children living in houses located on cul-de-sacs and through streets in the Sacramento region. Because no sampling frame exists for this population, we will use a multi-stage cluster sampling strategy. First, residential neighborhoods throughout the region will be defined based on major roadways and other geographic features. Census data will be used to eliminate neighborhoods built before 1950 because of the infrequent use of cul-de-sacs in residential developments before this time. From the remaining post-1950 neighborhoods, a random sample of neighborhoods will be selected. Within these neighborhoods, cul-de-sacs will be identified using the 2000 Census street network and the capabilities of geographic information systems (GIS). A random sample of the cul-de-sacs within each neighborhood will be chosen. For each cul-de-sac in the sample, a segment of a nearby through street (defined as a street that links arterial streets and carries significant levels of through traffic) of otherwise similar characteristics and similar length will be chosen. This approach creates matched pairs of streets. Next, all addresses on the sample street pairs will be compiled to create a sample of households. The sample of streets will be used in the field observations, the sample of households will be used in the household survey, and a sub-sample of the sample of households will be used in the in-depth interviews, as described below.

Handy, et al., 2005

Handy, et al., 2005

Other things to think about!Goal Example 1

Child as unit of analysisExample 2Neighborhood as unit

Ensuring that the independent variable varies

Sample includes children who live on cul-de-sacs and children who don’t

Sample includes neighborhoods that have lots of cul-de-sacs and neighborhoods that have few cul-de-sacs

Ensuring that the control variables don’t vary (aka “elimination”)

Sample includes only children from moderate-income households

Sample includes only neighborhoods with average income in the moderate range

Types of Sampling

Type DefinitionProbability sampling i.e. random

Every element in the population has a none non-zero probability of being selected; sampling involves random selection (equal chance)

Non-probability sampling i.e. non-random

Do not know in advance how likely that any element of the population will be selected for the sample; non-random selection (not equal chance)

Non-Probability SamplingMethod DefinitionAvailability or Convenience Sampling

Cases selected because they’re easy to find

Quota Sampling (proportional or not)

Groups defined by key characteristics; specified number of cases selected in each group.

Purposive Sampling Individuals selected for sample because they possess a unique trait

Expert Sampling Individuals selected for sample because of their knowledge – “key informants”

Snowball Sampling Start with initial sample, ask them to recommend other participants

Used for exploratory and qualitative research.Must be very cautious about generalizing!

To do

• Sampling exercise on Tuesday (2/7)• Read and study for midterm (2/14)

• Don’t forget lecture slides and lecture summary notes on website!

• Office Hours• Calvin: Wed 9-11 AM• Dillon: Fri 8-9 + 10-11 AM

• Paper critique in discussion section on Friday• READ Lubell et. al. 2009 before discussion tomorrow!

Documents

Sampling - Environmental Science & Policy non-zero probability of being selected; sampling involves random selection (equal chance) Non-probability sampling . i.e. non-random: Do not