• Slide 1
• 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is down eighty percent, while heavy sack- beatings are up a shocking 900%? The Simpsons Homer: Aw, people can come up with statistics to prove anything, Kent. Forty percent of all people know that. The Simpsons, `Homer the Vigilante`
• Slide 2
• So what is statistics, anyway? The gathering, organizing, interpreting and understanding of data.
• Slide 3
• The Population The complete set of individuals or objects about which we are seeking information is referred to as the population. A silly example: Mayor Quimby takes a poll to find how many Springfielders plan to vote for him in the next election then the population would be the voters of Springfield.
• Slide 4
• The N - value If it were possible to accurately count every member of a population we would get a number, N, called the N - value of the population.
• Slide 5
• The N - value (A Point or Two) This value is often difficult to make--and therefore can require various adjustments. For instance, if a scientist were studying the effect of genetically modified corn on the monarch butterfly it would be practically impossible to accurately calculate the N - value.
• Slide 6
• The N - value (A Point or Two) This value can change with time. In the case of the monarch butterfly, the number of actual insects will obviously be different from year-to- year.
• Slide 7
• Article 1, Section 2: [...] Representatives and direct taxes shall be apportioned among the several states which may be included within this union, according to their respective numbers, which shall be determined by adding to the whole number of free persons, including those bound to service for a term of years, and excluding Indians not taxed, three fifths of all other Persons. The actual Enumeration shall be made within three years after the first meeting of the Congress of the United States, and within every subsequent term of ten years, in such manner as they shall by law direct. Example: U.S. Census
• Slide 8
• What exactly is the census used for? Determination of taxes and political representation. Collects demographic information that is used to determine the allocation of federal money to state and local governments. Used to calculate government statistics like the Consumer Price Index.
• Slide 9
• Example: U.S. Census Why are we even talking about this? In the 18th-Century the U.S. population was for the most part small, immobile and homogeneous. Today = The Exact Opposite. The result is that the modern census noticeably undercounts citizens. In fact, it differentially undercounts minorities and the poor. There are statistical methods that could be used to correct some of the inaccuracy, but they cannot be used for determining Congressional apportionment by Supreme Court ruling.
• Slide 10
• Surveys Since collecting information from large populations is so difficult, researchers instead gather data from selected subgroups and use that information to make inferences regarding the population as a whole. This process is what math-and- science-y types refer to as a survey. The selected subgroup is called a sample.
• Slide 11
• Surveys (contd) There are two major issues when setting up a survey: 1. You need a sample that is a good representative of the population being studied. 2. You need a sample that is large enough to draw accurate information from, yet still small enough to be practical.
• Slide 12
• Example: Literary Digest and the 1936 Presidential Election Literary Digest was a popular magazine that had accurately predicted the winner in the five elections prior to 1936. That year, the publication ambitiously decided to poll 10 million Americans. The individuals contacted came from magazine subscription lists and telephone directory listings.
• Slide 13
• Example: Literary Digest and the 1936 Presidential Election When the results came in 2.4 million people had responded and the survey predicted that the vote would end Landon: 57%FDR: 43% What actually happened?
• Slide 14
• The actual results were FDR: 61%Landon: 36.5%Other: 2.5% (Landon, in fact, did not even carry his home state.)
• Slide 15
• George Gallup, however, made an accurate prediction with a sample of only 50,000 people. Why were his results superior?
• Slide 16
• George Gallup, however, made an accurate prediction with a sample of only 50,000 people. Why were his results superior? There are two main reasons: 1. The names were taken from phone directories and subscription lists--the people surveyed were disproportionately wealthy. When a survey has an inherent tendency to exclude a segment of the population being studied it is said to have selection bias. 2. Out of 10 million people contacted only 24% replied. This example of what is called nonresponse bias only magnified the first problem. George Gallup, however, made an accurate prediction with a sample of only 50,000 people. Why were his results superior? There are two main reasons: 1. The names were taken from phone directories and subscription lists--the people surveyed were disproportionately wealthy. When a survey has an inherent tendency to exclude a segment of the population being studied it is said to have selection bias. 2. Out of 10 million people contacted only 24% replied. This example of what is called nonresponse bias only magnified the first problem.
• Slide 17
• Example: Aint the way I heard it. In 1935 Gallup had introduced a statistical method known as quota sampling. This technique was a systematic way of matching a sample to a designated profile. With a sample of 3250 people specifically chosen as a cross-section of the country Gallup predicted a result for the 1948 election of Dewey: 49.5%Truman: 44.5% Thurmond, Wallace, etc: 6%
• Slide 18
• The actual result was... Truman: 49.9%Dewey: 44.5%Others: 5% So, what went wrong this time?
• Slide 19
• The actual result was... Truman: 49.9%Dewey: 44.5%Others: 5% So, what went wrong this time? There are too many characteristics you could use for your quota. The methods used in 1948 did not take economic status into account and oversampled Republican voters. Most pollsters stopped gathering data when Dewey was coming in 13% ahead of Truman in some of the surveys. The actual result was... Truman: 49.9%Dewey: 44.5%Others: 5% So, what went wrong this time? There are too many characteristics you could use for your quota. The methods used in 1948 did not take economic status into account and oversampled Republican voters. Most pollsters stopped gathering data when Dewey was coming in 13% ahead of Truman in some of the surveys.
• Slide 20
• Lessons to take from these occurrences... A small, well-chosen sample is better than a poorly-chosen large one. Selection bias and nonresponse bias need to be taken into account. Dont stop surveying early. Quota sampling is flawed.
• Slide 21
• Random Sampling Random sampling: methods in which a level of chance is used to choose a sample Simple random sampling: a larger scale version of picking names out of a hat. The problem with simple random sampling is one of practicality.
• Slide 22
• Random Sampling The solution--used in modern opinion polling--is stratified sampling. This method breaks the population down into strata (categories) and then randomly choose a sample from the strata. The strata are then divided into substrata and the process is continued
• Slide 23
• Example: Opinion Polls Modern opinion polls construct their strata as follows: 1. The nation is divided into size of community strata. 2. These strata are divided by geographic location. 3. Communities in each geographic region are picked randomly. 4. Wards, precincts and households are then found randomly.
• Slide 24
• Example: Opinion Polls The result is an efficient method that generally yields accurate results. Usually, 1000-2000 people are needed for an accurate opinion poll. This does not depend on the size of the population being studied.
• Slide 25
