366 5. Sampling Defined / The idea – Making inference about a larger population What is the population – Some particular value in the population estimating

366

5

Sampling

• Defined / The idea– Making inference about a larger population• What is the population

– Some particular value in the population• estimating a parameter

Sampling

Sampling

• Population must be defined– If interested in opinions of...• All adults• Registered voters• Likely voters• Actual voters

• These are all distinct populations

Sampling

• Population must be defined– If interested in opinions of...• People in Whatcom County• Voters in Whatcom County• People in Bellingham• Voters in Bellingham• Likely voters in Bellingham

• These are all distinct populations

Sampling

• Population must be defined– If interested in opinions of...• Students at WWU• Seniors at WWU (xxx # of credits & up)• Students in College of Arts & Sciences• etc.

• These are all distinct populations; who should be included, excluded

Sampling

• Sampling unit– A single member of the population• a case

– If population = conflicts (wars)• sampling unit = nations of a certain size

Sampling

• Sampling Frame

• Once clear about what population & units are, how do we find them?– Frame = complete list of population• Registered voters; Students at WWU

– In reality this may not exist• e.g., all people living in the US

Sampling

• Sampling Frame• US Census

– How get ‘the list?’– $3billion; 500,000 workers...

Sampling

• Sampling Frame

• Registered voters; Students at WWU

– Piece of cake?

– Accuracy of sample depends on comprehensiveness of frame

Sampling

• Sampling Frame

• Ahead of time, evaluate for problems– Missing elements• New residents, newly registered voters, ?

– Clusters• Census tracts, city blocks, Zip code, Area code, prefix

– Take random draw of clusters, then random draw of households in cluster

Sampling

• Sampling Frame

• Ahead of time, evaluate for problems– Blank elements• Phone directories (address w/o #)• Phone #s (unassigned prefixes; fax machine; pager)• List of all residents when population = voters

Classic Sample Failure

• 1936 Literary Digest Survey– Survey of 2.4 million Americans– Predicted Alf Landon 57%, FDR 43%– Actual resultFDR 62%, Landon 38%

– Frame = 10 million people• subscribers to Digest; phone directories; club

memberships


• 1936 Literary Digest Survey

– What went wrong?


• 2000 & 2004 & 2012 (WI) US Exit polls– Surveys of tens of thousands– 2000 initially predicted Gore win FL• Actually, Bush won

– 2004 initially predicted Kerry win OH• Actually, Bush won

• Frame:– Key precincts, people voting at polling places

2004 VNS Exit Polls, Ohio

“This can’t happen in America. Maybe in Ohio...”

• http://www.youtube.com/watch?v=ArC7XarwnWI

• 2008

• http://www.youtube.com/watch?v=IoWJkrlptNs

http://www.youtube.com/watch?v=IoWJkrlptNs




• 2000 & 2004 US Exit polls– What went (goes) wrong?

– also response bias that favors Democrats

Sample Designs

• Probability vs. Non probability sampling– Probability sample• We know the probability that each unit in the

population has of being in the sample

– Non probability sample• We don’t know if every unit has a fixed chance of being

in sample

Sample Design

• Probability sample– If 22% of population are white, males over 21

years of age...– a .22 probability that a white, male over 21 would

end up in sample

Sample Design

• Probability sample– If study repeated w/ different samples, high

likelihood that results similar

– We can estimate likelihood that things observed in the sample are representative of the population

Sample Design

• Real world probability sample problems– Population = likely voters– Good sample frame?

• Voters yes, likely voters no

– Proper randomization • You try it

– Missing elements• Land line vs. cell phones

Probability Samples

• Simple random sampling• Systematic samples• Stratified samples• Cluster samples

Probability Samples

• Simple random sampling– List each unit (person) in population– Give each a number (List from 1 to n)– Use random # generator– If 1207 comes up, select #1207 from list– Repeat

Probability Samples

• Systematic sample– Have list of population, 1 – nth– Find random #, start there on list– Pick each kth unit (person) on list– Hope there is no structure to list• Starting point random, increment random

– Easier• Kind of how exit polls work at polling place

Probability Sample

• Stratified sample– Use available information from the population– Dived so elements w/ in groups (strata) are more

alike than population– A series of homogeneous groups

• Race/ethnicity; income

– Combine samples into one

• Cheaper

Probability Samples

• Cluster sample– Identify clusters (groups)– Select large groups by random• Cities, congressional districts, states, neighborhoods

– Randomly sample within cluster

– Cheaper, no list of national US voters; consider face to face interviews

Probability Samples

• Simple random sampling• Systematic samples• Stratified samples• Cluster samples

• Other types, some of these used together

Non-probability Samples

• Convenience sample– All students in this class• Population = WWU students

– First 200 people walking down Railroad Ave.• Population = Whatcom County voters

– No way to know representativeness of sample

Non-probability samples

• Purposive sample– Units selected subjectively– Chance of being selected depends on researcher’s

judgment

– “Critical elections”• Population = all US Presidential elections

– “Major wars”• Population = all wars

Non-probability sample

• Quota sample– Purposively select sample as representative as

possible– Use know characteristics of population– Target quota based on know characteristics


• Quota sample– WWU (Fake example)• 57% female, 43% male• 45% A&S; 25% CST; 10% CBE; 10% Huxley; 10% other• Age• Ethnicity


• Quota sample– Whatcom Co. (Fake example)• Gender• Age• Partisanship• City resident vs. County resident

• Monitor demographics of respondents as you go


• Quota sample– Poor person’s random

sampling– Can fail to predict

– 1948 3 surveys predicted Dewey to win

– None targeted partisanship

Internet Samples

• Opt-in

• Provide people computers

• Huge samples asked to do interviews

• “Weight” data after responses to represent population

Sample size

• If sample random (ish), precision of estimates depend on size

• Larger = more precise estimate, all else equal

• Very large doesn’t add much precision

Sample size

• Diminishing returns on size

• Depends on scale of population, subgroups– Whatcom Co.– State of WA– USA

Sample size

• Diminishing returns on size

• Depends on scale of population, subgroups– Whatcom Co.– State of WA– USA

Documents

366 5. Sampling Defined / The idea – Making inference about a larger population What is the population – Some particular value in the population estimating