Upload
nguyenhanh
View
231
Download
4
Embed Size (px)
Citation preview
SamplingESP 178 Applied Research Methods
Calvin Thigpen2/2/17
Adapted from lecture by Professor Susan Handy
Ethics
Hypothetical example
Movie Trailer BBC story
To Test Housing Program, Some Are Denied AidNew York Times, 12/8/2010
“Half of the test subjects — people who are behind on rent and in danger of being evicted —are being denied assistance from the program for two years, with researchers tracking them to see if they end up homeless.”
http://www.nytimes.com/2010/12/09/nyregion/09placebo.html?pagewanted=all&_r=0
“Moving to Opportunity for Fair Housing (MTO) is a 10-year research demonstration that combines tenant-based rental assistance with housing counseling to help very low-income families move from poverty-stricken urban areas to low-poverty neighborhoods.”
Moving to Opportunity
http://portal.hud.gov/hudportal/HUD?src=/programdescription/mto
Treatment group: Randomly selected households with children receive housing counseling and vouchers that must be used in areas with less than 10 percent poverty
Control groups: One already receiving vouchers, one just coming into voucher program
Ethical principles
• Minimize harm• No deception• Informed consent• Identity protection• Distribute benefits equitably
What we’ve covered…
Aspect of Research Type of ValidityWhat to study Conceptualization and
operationalizationMeasurement
Who to study Sampling External (Generalizability)
How to study it Research design Internal (Causal)
Sampling
Sampling is the (statistical) process of selecting a subset of a population of interest for purposes of making observations and (statistical) inferences about that population.
How it worksUnit of Observation
PopulationWho we want to know about or generalize to People living in selected area
Sampling FrameA list of units within population from which you draw your sample
SampleWho we collect data from
Types of Sampling
Type DefinitionProbability sampling i.e. random
Every element in the population has a non-zero probability of being selected; sampling involves random selection (equal chance)
Non-probability sampling i.e. non-random
Do not know in advance how likely that any element of the population will be selected for the sample; non-random selection (not equal chance)
Probability samplingGoal is a representative sample –
one that resembles the population of interest
Population
Sample
Generalizability
A Different Population
Sample Generalizability Cross-Population Generalizability
Sampling Error
Difference between characteristics of sample and characteristics of population
Random sampling error
Inherent in process of sampling! Measure it with confidence intervals (see below).
Systematic samplingerror
Depends on how good your sampling method is!Use good methods to minimize this.
The Sampling Frames Challenge
Issues?
Alternatives? ?
Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chance
Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chanceSystematic Random Sampling
Select first element randomly, then pick every nth element
To select first element, then every 10th element
OR to select element on each page
Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chanceSystematic Random Sampling
Select first element randomly, then pick every nth element
Stratified Random Sampling – Proportionate
Sort population into strata; randomly select within strata; proportionate to population
If sampling frames available for strata but not overall population.To have more homogeneous samples (see below).
Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chanceSystematic Random Sampling
Select first element randomly, then pick every nth element
Stratified Random Sampling – Proportionate
Sort population into strata; randomly select within strata; proportionate to population
Stratified Random Sampling – Disproportionate
Sort population into strata; randomly select within strata; disproportionate to population
To ensure enough elements in small strata.Strata defined by some characteristic…
Campus Travel Survey Sampling
http://its.ucdavis.edu/research/publications/publication-detail/?pub_id=2537
Campus Mode Share
Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chanceSystematic Random Sampling
Select first element randomly, then pick every nth element
Stratified Random Sampling – Proportionate
Sort population into strata; randomly select within strata; proportionate to population
Stratified Random Sampling – Disproportionate
Sort population into strata; randomly select within strata; disproportionate to population
Cluster Sampling Draw random sample of clusters, then select elements within clusters
Cluster = naturally occurring grouping, e.g. neighborhoods, classes, etc. Often used for practical purposes – if in-person data collection.
Exit Polling
Difference between Stratified-Random and Cluster Sampling
Divide population into groups
Draw random sample of groups
Stratified Random Cluster
Randomly sample units within groups
Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chanceSystematic Random Sampling
Select first element randomly, then pick every nth element
Stratified Random Sampling – Proportionate
Sort population into strata; randomly select within strata; proportionate to population
Stratified Random Sampling – Disproportionate
Sort population into strata; randomly select within strata; disproportionate to population
Cluster Sampling Draw random sample of clusters, then select elements within clusters
Matched-pairs Sampling Divide population into two groups by key characteristic; draw random sample in Group 1, then find matches in Group 2
How do you know if you have a representative sample?
Source: Thigpen and Volker, 2017
Compare to census data…
Does sample size matter?
Inferential Statistics for Probability SamplesTerm Definitionn Sample sizeSample statistic Statistic computed from sample, e.g. meanPopulation parameter True value of statistic, e.g. mean, for populationSampling error Population parameter – sampling statistic
We don’t know the population parameter!So we don’t know the sampling error!
But we can estimate a confidence interval…
Sample mean
Sample StatisticStandard Deviation = how close individual scores are to the sample mean
Scores
Population mean
Let’s say we take a bunch of samples…Standard error = how close mean scores from repeated samples are to the population mean
Sample means
Calculating a confidence interval
s𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑡𝑡𝑡𝑡𝑒𝑒𝑡𝑡 = 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑒𝑒𝑑𝑑𝑑𝑑𝑡𝑡𝑡𝑡𝑑𝑑𝑒𝑒𝑡𝑡/ 𝑡𝑡
confidence interval = 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑑𝑑𝑠𝑠𝑡𝑡𝑑𝑑𝑠𝑠 ± 2 𝑥𝑥 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑡𝑡𝑡𝑡𝑒𝑒𝑡𝑡
The 68-95-99 percent rule for confidence intervals
Calculating a confidence interval
s𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑡𝑡𝑡𝑡𝑒𝑒𝑡𝑡 = 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑒𝑒𝑑𝑑𝑑𝑑𝑡𝑡𝑡𝑡𝑑𝑑𝑒𝑒𝑡𝑡/ 𝑡𝑡
How do we reduce standard error?
More homogeneous populations mean tighter confidence intervals: SD ↓ → SE ↓ → CI ↓
Larger sample sizes mean tighter confidence intervals: n ↑ → SE ↓ → CI ↓
95% confidence interval =𝑚𝑚𝑒𝑒𝑡𝑡𝑡𝑡 ± 1.96 𝑥𝑥 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑡𝑡𝑡𝑡𝑒𝑒𝑡𝑡
Example
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
1 2 3 4 5 6 7 8 9 10 11 12 13
Pints per Week
Num
ber o
f Stu
dent
s
UCDCSUC
Calculation of Confidence Intervals UCD CSUC
Mean 4.1 6.0
Standard Deviation 2.54 3.32
Calculation of Confidence Intervals UCD CSUC
Mean 4.1 6.0
Standard Deviation 2.54 3.32
Standard Error if n=20 1.26 1.36
95% CI low 1.57 3.34
95% CI high 6.53 8.66
= 𝑆𝑆𝑆𝑆/ 𝑡𝑡
= 𝑚𝑚𝑒𝑒𝑡𝑡𝑡𝑡 − 2 ∗ 𝑆𝑆𝑆𝑆= 𝑚𝑚𝑒𝑒𝑡𝑡𝑡𝑡 + 2 ∗ 𝑆𝑆𝑆𝑆
Are the means different?i.e. do the confidence intervals overlap?
Calculation of Confidence Intervals UCD CSUC
Mean 4.1 6.0
Standard Deviation 2.54 3.32
Standard Error if n=20 1.26 1.36
95% CI low 1.57 3.34
95% CI high 6.53 8.66
Standard Error if n=200 0.18 0.24
95% CI low 3.70 5.54
95% CI high 4.40 6.46
Are the means different?
Another way to think about sample size – Power Analysis• Relates to experimental design: effect size
Power = probability that the statistical analysis detects the effect
Sample size matters
Larger sample is better – up to a point
Population
Sample
Population
Sample
Percent women based on random sample…
Sample Size Margin of Error Range for Estimate
5 44% 6 – 94%
10 31% 19 – 81%
20 22% 28 – 72%
30 17% 33 – 67%
40 15% 35 – 65%
50 13% 37 – 63%
http://www.methodspace.com/group/qualitativeinterviewing/forum/topics/two-problems-with-random-sampling-and-two-alternatives
A few examples
Prof. Handy’s Cul-de-Sac Sampling Plan:
The household survey will provide the primary data for testing the hypothesis. We expect the design of the sampling plan for the survey and the design of the survey instrument itself to be particularly challenging for this study. The target population for this study is children living in houses located on cul-de-sacs and through streets in the Sacramento region. Because no sampling frame exists for this population, we will use a multi-stage cluster sampling strategy. First, residential neighborhoods throughout the region will be defined based on major roadways and other geographic features. Census data will be used to eliminate neighborhoods built before 1950 because of the infrequent use of cul-de-sacs in residential developments before this time. From the remaining post-1950 neighborhoods, a random sample of neighborhoods will be selected. Within these neighborhoods, cul-de-sacs will be identified using the 2000 Census street network and the capabilities of geographic information systems (GIS). A random sample of the cul-de-sacs within each neighborhood will be chosen. For each cul-de-sac in the sample, a segment of a nearby through street (defined as a street that links arterial streets and carries significant levels of through traffic) of otherwise similar characteristics and similar length will be chosen. This approach creates matched pairs of streets. Next, all addresses on the sample street pairs will be compiled to create a sample of households. The sample of streets will be used in the field observations, the sample of households will be used in the household survey, and a sub-sample of the sample of households will be used in the in-depth interviews, as described below.
Handy, et al., 2005
Handy, et al., 2005
Other things to think about!Goal Example 1
Child as unit of analysisExample 2Neighborhood as unit
Ensuring that the independent variable varies
Sample includes children who live on cul-de-sacs and children who don’t
Sample includes neighborhoods that have lots of cul-de-sacs and neighborhoods that have few cul-de-sacs
Ensuring that the control variables don’t vary (aka “elimination”)
Sample includes only children from moderate-income households
Sample includes only neighborhoods with average income in the moderate range
Types of Sampling
Type DefinitionProbability sampling i.e. random
Every element in the population has a none non-zero probability of being selected; sampling involves random selection (equal chance)
Non-probability sampling i.e. non-random
Do not know in advance how likely that any element of the population will be selected for the sample; non-random selection (not equal chance)
Non-Probability SamplingMethod DefinitionAvailability or Convenience Sampling
Cases selected because they’re easy to find
Quota Sampling (proportional or not)
Groups defined by key characteristics; specified number of cases selected in each group.
Purposive Sampling Individuals selected for sample because they possess a unique trait
Expert Sampling Individuals selected for sample because of their knowledge – “key informants”
Snowball Sampling Start with initial sample, ask them to recommend other participants
Used for exploratory and qualitative research.Must be very cautious about generalizing!
To do
• Sampling exercise on Tuesday (2/7)• Read and study for midterm (2/14)
• Don’t forget lecture slides and lecture summary notes on website!
• Office Hours• Calvin: Wed 9-11 AM• Dillon: Fri 8-9 + 10-11 AM
• Paper critique in discussion section on Friday• READ Lubell et. al. 2009 before discussion tomorrow!