56
Sampling Sampling Andrew Martin Andrew Martin PS 372 PS 372 University of Kentucky University of Kentucky

Week 7 Sampling

Embed Size (px)

Citation preview

Page 1: Week 7   Sampling

SamplingSamplingAndrew MartinAndrew Martin

PS 372PS 372University of KentuckyUniversity of Kentucky

Page 2: Week 7   Sampling

What do polls tell us?

Page 3: Week 7   Sampling

Population❖ If we want to assess American support for

competing presidential candidates, we must clarify what we mean by Americans.

❖ In other words, we need to specify a population.

❖ A population is the complete set of relevant units of analysis.

❖ For the purpose of studying elections, the population is generally defined as the U.S. voting-age population (residents 18 and older).

Page 4: Week 7   Sampling

Sample❖ Interviewing every voting-age American would be

impossible; the time and money constraints are too great.

❖ However, it is possible (and more practical) to select a sample from the population for investigation.

❖ A sample is any subset of units collected in some manner from a population.

❖ The sample size and method ultimately determines the quality of inferences that can be made about the population.

Page 5: Week 7   Sampling

Population vs. Sample

Study Population Sample

U.S. VotingVoting-Age

Pop.Gallup Poll

Respondents

Military Coups All coupsCoups in Latin

America in 1980sSupreme Court

Decision-making

All decisions on the merits

Merits decisions on Burger Court

Democratization

All democratizing

countries

Democratizing post-Cold War

Page 6: Week 7   Sampling

Populations and Samples

❖ Ideally, political scientists would like to measure and gather information about the population.

❖ Examples: Averages, differences between two groups, relationships among variables.

❖ If this information can be found and quantified for the entire population, the number is known as a population parameter.

❖ However, sample statistics allow political scientists to approximate the corresponding population values, or parameters.,

Page 7: Week 7   Sampling

Population Parameters

❖ Population parameters are typically denoted by lowercase English or Greek letters, usually the Greek letter theta ( θ ).

❖ A proportion, such as the proportion of Americans who support the war in Iraq at a particular time, typically designated as P or π.

Page 8: Week 7   Sampling

Sample Statistics

❖ Sample statistics are frequently expresses with a hat (^) over a character to denote it is not a parameter but a statistic. Sometimes lowercase p is used for a sample proportion.

Page 9: Week 7   Sampling

Statistics vs. Parameters

❖ For a population mean, the lowercase Greek letter Mu (μ) is used.

❖ For the corresponding sample statistic, μ hat is used, or Y bar

Page 10: Week 7   Sampling

Population vs. Samples

❖ An element is a single occurrence, realization or instance of the objects or entities being studied.

❖ A population can be subdivided into small groups known as strata.

❖ Each stratum shares one or more characteristic.

Page 11: Week 7   Sampling

Sampling

❖ The particular population from which a sample is actually drawn is called a sampling frame.

❖ Sampling frames are supposed to contain all elements that are part of the population of interest, but in practice are often incomplete.

❖ Example: Polling UK students using the annual student phone directory.

Page 12: Week 7   Sampling

1936 Presidential Election

❖ Literary digest predicted that Republican Alf Landon would defeat Democrat Franklin D. Roosevelt.

❖ The population: U.S. voters.

❖ The sampling frame: Telephone directories and automobile registration lists.

❖ Telephone and car ownership were not common then. The sample was not representative of the actual population because it overrepresented wealthy voters.

Page 13: Week 7   Sampling

Sampling❖ Now, virtually everyone has a telephone. But

some people have unlisted numbers.

❖ Researchers have developed random digit dialing to deal with this problem. A computer randomly selects telephone numbers, which is how people with unlisted numbers are contacted.

❖ However, not everyone owns a telephone. Millions of people are switching to cell phones, which will eventually cause pollsters to change their methodology.

Page 14: Week 7   Sampling

Sampling

❖ Political science researchers like to use information collected in the sample to make inferences about the whole population.

❖ If the sampling frame is incomplete or inappropriate, sample bias will occur.

❖ This causes the sample to be unrepresentative of the population and can lead scholars to draw incorrect conclusions.

Page 15: Week 7   Sampling

Sampling

❖ A probability sample is simply a sample for which each element in the total population has a known probability of being sampled.

❖ A nonprobability sample is one in which each element in the population has an unknown probability of being selected.

Page 16: Week 7   Sampling

Simple Random Sample

❖ In a simple random sample each element and combination of elements has an equal chance of being selected.

❖ However, this is often difficult to do in practice.

Page 17: Week 7   Sampling

Simple Random Sample❖ During the Vietnam War, the selective service

decided it would randomly draft soldiers by sampling days of the year.

❖ A drum contained 366 capsules with days of the year. Capsules were drawn, and men born on the day written on the capsule would be called to serve in the military unless exempted.

❖ However, the capsules must not have been properly mixed, because the Selective Service tended to oversample days during the last six months of the year.

Page 18: Week 7   Sampling

Simple Random Sample Method 1

❖ Example: We have 1,507 elements in the population and wish to draw a sample of 150.

❖ Every element is numbered, starting at 1 and ending at 1,507.

❖ Using a random number table, elements can be selected each time its corresponding number appears.

❖ Any system of combining the numbers is acceptable as long as the numbers are random.

Page 19: Week 7   Sampling

Simple Random Sample Method 2

❖ Example: We have 1,507 elements in the population and wish to draw a sample of 150.

❖ All elements are represented on corresponding marbles and put in a hit, which is continuously and thoroughly mixed.

❖ Each element has an equal chance of being selected.

Page 20: Week 7   Sampling

Systematic Sample

❖ Elements are selected from a list at pre-determined intervals. In other words, they are chosen systematically rather than randomly.

❖ Every jth element on a list is selected. This number is known as the sampling interval.

❖ If you have a population of 2,000 on a list and you want a sample of 200, you can select every 1oth element on the list for the sample.

❖ Usually the starting number is randomly selected. This is known as a random start.

Page 21: Week 7   Sampling

Systematic Sample

❖ Systematic samples could be biased if:

1. The elements on the list have been ranked according to a characteristic.

2. If the list contains a pattern corresponding to a sampling interval.

Page 22: Week 7   Sampling

Stratified Sample

❖ A stratified sample is a probability sample in which elements sharing one or more characteristics are grouped, and elements are selected from each group in proportion to the group’s representation in the total population.

❖ It is less difficult to draw a sample from a homogeneous population than a heterogeneous population.

Page 23: Week 7   Sampling

Stratified Sample❖ Can be proportionate or disproportionate.

❖ In a proportionate sample, each stratum is represented in proportion to its size in the population.

❖ To determine the number of elements to sample in each stratum, a sampling fraction must be calculated.

Page 24: Week 7   Sampling

Proportionate Stratified Sample

❖ Ex: We have 8,000 freshmen, 5,000 sophomores, 4,000 juniors and 3,000 seniors we wish to poll about the presidential election. We want to sample 2,000 students.

❖ The sampling fraction would be the desired sample size divided by the population, so 2000/20000.

❖ Therefore, we would sample 10 percent of each stratum, which in this case by school year.

Page 25: Week 7   Sampling

Proportionate Stratified Sample

❖ When selecting characteristics on which to stratify a list, you should chose those expected to have a relationship with the dependent variable.

❖ Ex: Household income samples should stratify by education, sex and race.

❖ Ex: MCs stratified by party and experience.

❖ Ex: News stories by network.

Page 26: Week 7   Sampling

Disproportionate Stratified Sample

❖ In a disproportionate sample, some strata are overrepresented and others are overrepresented.

❖ Usually, strata making up a smaller percentage of the population get oversample so we can make useful inferences about that group independent of the other strata.

❖ To prevent having a biased sample, each strata is weighted by its proportion of the population.

Page 27: Week 7   Sampling

Disproportionate Stratified Sample

❖ JRM 223

❖ .625(Liberal Arts) + .125(Engineering) + .25(Business) = Mean Student Body GPA

❖ .625(2.5) + .125(3.3) + .25(2.7) = 2.65

Page 28: Week 7   Sampling

Cluster Samples

❖ A cluster sample is a probability sample in which the sampling frame initially consists of clusters of elements.

Page 29: Week 7   Sampling

❖ NN 174

Page 30: Week 7   Sampling

Cluster Samples

❖ Suppose there are 500 blocks, and from these blocks 25 are chosen at random.

❖ On these 25 blocks, there are 4,000 dwelling units or households.

❖ One quarter of these households will be contacted because we desire a sample of 1,000 individuals.

Page 31: Week 7   Sampling

Cluster Samples

❖ Each household's probability of being selected is the same. How do we know this?

❖ 25/500 (probability that the household's block will be chosen) X 1000/4000 (probability of being surveyed if block is chosen) = 1/80

Page 32: Week 7   Sampling

Nonprobability Samples

❖ Sometimes an element's chance of being selected is unknown, and nonprobability samples have to be collected.

❖ Although probability samples are prefereable, sometimes they are not feasible.

❖ Researchers may be able to learn more studying carefully selected, even unusual cases.

Page 33: Week 7   Sampling

Nonprobability Samples❖ Purposive samples allow the researcher to have

discretion in selecting elements for observation. (Ex: Fenno's Home Style)

❖ A quota sample is a sample in which elements are sampled in proportion to the population. Similar to a stratified sample but elements are not chosen probabilistically.

❖ In a snowball sample, respondents are asked to identify other persons who might qualify for inclusion in the sample.

Page 34: Week 7   Sampling

Statistical Inference

❖ Statistical inference involves the mathematical theory and techniques for making conjectures about the unknown characteristics (parameters) of populations based on samples.

Page 35: Week 7   Sampling

Statistical Inference

❖ Samples statistics provide us with estimates or approximations of population parameters.

❖ These estimates may differ from the “true” value of the population parameter, but if the sample is collected correctly and is large enough, the estimates are unlikely to be far from the truth.

Page 36: Week 7   Sampling

Statistical Inference

❖ We will focus on three concepts:

❖ Expected values

❖ Standard errors

❖ Sampling distributions

Page 37: Week 7   Sampling

Expected Value

❖ Expected value is the mean or average value of a sample statistic based on repeated samples from a population.

Page 38: Week 7   Sampling

Expected Value

❖ Suppose a candidate for state senate wants to know how many independent voters live in a district, which has grown rapidly during the last 10 years. Therefore, there are no reliable Census data available.

❖ Why might a state senator care about the number of independents in his/her district?

Page 39: Week 7   Sampling

Expected Value❖ Suppose the true level of registered Independents is 25 percent,

or .25.

❖ In formal terms, P = .25, where P = population parameter.

❖ You take the first sample. Two of 10 respondents say they are independents. Your first estimate, or sample statistic, has some sampling error.

❖ Specifically, the sampling error is the discrepancy (.05) between the population parameter (.25) and the sample statistic (.20).

Page 40: Week 7   Sampling

Expected Value

❖ What about four samples? Let's assume you conduct 4 samples of 10 and the proportion of independents is: (.20 + .30 + .40 + .20)/4 = .275 (observed value or mean not far from true value of .25)

❖ Four samples of 10 brings us closer to the truth than one sample. What about 1,000 samples of 10? What about 1,000 samples of 50?

Page 41: Week 7   Sampling

Expected Value

❖ If statistics (or observed values) are calculated for each of many, many independently and randomly chosen samples, their average or mean will equal the corresponding population parameter (or true value).

❖ Statisticians refer to this mean as the expected value (E) of the estimator.

Page 42: Week 7   Sampling

Expected Value

❖ If θ represents the population parameter, then represents a sample estimator of that characteristic. We can then write:

E =

Page 43: Week 7   Sampling

Expected Value

❖ In the case of a sample proportion based on a simple random sample, we have:

❖ E (p) = Pwhere p is sample proportion, P is the estimated population proportion.

❖ In the long run, the sample statistic the average results would theoretically equal the true value or the population proportion.

Page 44: Week 7   Sampling

Sampling Distribution

❖ A sampling distribution of a sample statistic is a theoretical expression that describes the mean variation, and shape of the distribution in an infinite number of occurrences of the statistic when calculated on samples of size N drawn independently and randomly from a population.

Page 45: Week 7   Sampling

❖ JRM 230

Page 46: Week 7   Sampling

❖ JRM 233

Page 47: Week 7   Sampling

❖ JRM 234

Page 48: Week 7   Sampling

❖ JRM 235

Page 49: Week 7   Sampling

Confidence and Error

❖ Confidence is the degree of belief that an estimated range of values – more specifically, a high or low value – includes or covers the population parameter. In political science this concept is normally described by a confidence interval.

❖ Standard error is the standard deviation or measure of variability of a sampling distribution. In other words, it tells us how much variation there is in the sampling methods.

Page 50: Week 7   Sampling

Standard Error❖ The standard error of a reported proportion or

percentage p measures its accuracy, and is the estimated standard deviation of that percentage. It can be estimated from just p and the sample size, n, if n is small relative to the population size, using the following formula.

Page 51: Week 7   Sampling

Standard Error

.25.7510

=.14

Page 52: Week 7   Sampling

Confidence Intervals

❖ Get a poll, talk about it.

Page 53: Week 7   Sampling
Page 54: Week 7   Sampling

Gallup Poll Standard Error

❖ If we want the margin of error, or confidence interval for the poll, we multiply this number by 1.96.

❖ .01 * 1.96 = 1.96 ≈ 2

.52.482761

=.01

Page 55: Week 7   Sampling

❖ Margin of Error Graphic

Page 56: Week 7   Sampling

Confidence Intervals

❖ Standard distribution photo