§ 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to...

Preview:

Citation preview

§ 13.1 - 13.3 Populations, § 13.1 - 13.3 Populations, Surveys and Random Surveys and Random

SamplingSamplingKent:  “Mr. Simpson, how do you

respond to the charges that petty vandalism such as graffiti is down eighty percent, while heavy sack-

beatings are up a shocking 900%?”

Homer:  “Aw, people can come up with statistics to prove anything, Kent.  Forty percent of all people

know that.”The SimpsonsThe Simpsons, `Homer the

Vigilante`

So… what is statistics, So… what is statistics, anyway?anyway?

The gathering, organizing, interpreting and understanding of data.

The PopulationThe Population

The complete set of individuals or objects about which we are seeking information is referred to as the population.

A silly example:Mayor Quimby takes a poll to find how many Springfielders plan to vote for him in the next election then the population would be the voters of Springfield.

The The N N - value- value

If it were possible to accurately count every member of a population we would get a number, N, called the N - value of the population.

The The N N - value (A Point or - value (A Point or Two)Two)

This value is often difficult to make--and therefore can require various adjustments.For instance, if a scientist were studying the effect of genetically modified corn on the monarch butterfly it would be practically impossible to accurately calculate the N - value.

The The N N - value (A Point or - value (A Point or Two)Two)

This value can change with time.

In the case of the monarch butterfly, the number of actual insects will obviously be different from year-to-year.

Article 1, Section 2:

[. . .]

Representatives and direct taxes shall be apportioned among the several states which may be included within this union, according to their respective numbers, which shall be determined by adding to the whole number of free persons, including those bound to service for a term of years, and excluding Indians not taxed, three fifths of all other Persons. The actual Enumeration shall be made within three years after the first meeting of the Congress of the United States, and within every subsequent term of ten years, in such manner as they shall by law direct.

Example: U.S. Census

Example: U.S. Census

What exactly is the census used for?

Determination of taxes and political representation.

Collects demographic information that is used to determine the allocation of federal money to state and local governments.

Used to calculate government statistics like the Consumer Price Index.

Example: U.S. Census

Why are we even talking about this?

In the 18th-Century the U.S. population was for the most part small, immobile and homogeneous.

Today = The Exact Opposite.

The result is that the modern census noticeably undercounts citizens. In fact, it differentially undercounts minorities and the poor.

There are statistical methods that could be used to correct some of the inaccuracy, but they cannot be used for determining Congressional apportionment by Supreme Court ruling.

SurveysSurveys

Since collecting information from large populations is so difficult, researchers instead gather data from selected subgroups and use that information to make inferences regarding the population as a whole.

This process is what math-and-science-y types refer to as a survey.

The selected subgroup is called a sample.

Surveys (cont’d)Surveys (cont’d)

There are two major issues when setting up a survey:

1. You need a sample that is a good representative of the population being studied.

2. You need a sample that is large enough to draw accurate information from, yet still small enough to be practical.

Example: Literary Digest and the 1936 Presidential

Election Literary Digest was a popular magazine that

had accurately predicted the winner in the five elections prior to 1936.

That year, the publication ambitiously decided to poll 10 million Americans.

The individuals contacted came from magazine subscription lists and telephone directory listings.

Example: Literary Digest and the 1936 Presidential

Election When the results came in 2.4 million

people had responded and the survey predicted that the vote would end Landon: 57% FDR: 43%

What actually happened?

The actual results wereFDR: 61% Landon: 36.5% Other: 2.5%(Landon, in fact, did not even carry his home state.)

George Gallup, however, made an accurate prediction with a sample of only 50,000 people.

Why were his results superior?

George Gallup, however, made an accurate prediction with a sample of only 50,000 people.

Why were his results superior? There are two main reasons:

1. The names were taken from phone directories and subscription lists--the people surveyed were disproportionately wealthy. When a survey has an inherent tendency to exclude a segment of the population being studied it is said to have selection bias.2. Out of 10 million people contacted only 24% replied.

This example of what is called nonresponse bias only magnified the first problem.

George Gallup, however, made an accurate prediction with a sample of only 50,000 people.

Why were his results superior? There are two main reasons:

1. The names were taken from phone directories and subscription lists--the people surveyed were disproportionately wealthy. When a survey has an inherent tendency to exclude a segment of the population being studied it is said to have selection bias.2. Out of 10 million people contacted only 24% replied.

This example of what is called nonresponse bias only magnified the first problem.

Example: “Ain’t the way I heard it.”

In 1935 Gallup had introduced a statistical method known as quota sampling. This technique was a systematic way of matching a sample to a designated profile.

With a sample of 3250 people specifically chosen as a cross-section of the country Gallup predicted a result for the 1948 election ofDewey: 49.5% Truman: 44.5%Thurmond, Wallace, etc: 6%

The actual result was. . .Truman: 49.9% Dewey: 44.5% Others: 5%

So, what went wrong this time?

The actual result was. . .Truman: 49.9% Dewey: 44.5% Others: 5%

So, what went wrong this time?

There are too many characteristics you could use for your quota.

The methods used in 1948 did not take economic status into account and oversampled Republican voters.

Most pollsters stopped gathering data when Dewey was coming in 13% ahead of Truman in some of the surveys.

The actual result was. . .Truman: 49.9% Dewey: 44.5% Others: 5%

So, what went wrong this time?

There are too many characteristics you could use for your quota.

The methods used in 1948 did not take economic status into account and oversampled Republican voters.

Most pollsters stopped gathering data when Dewey was coming in 13% ahead of Truman in some of the surveys.

Lessons to take from Lessons to take from these occurrences. . . these occurrences. . .

A small, well-chosen sample is better than a poorly-chosen large one.

Selection bias and nonresponse bias need to be taken into account.

Don’t stop surveying early. Quota sampling is flawed.

Random SamplingRandom Sampling

Random sampling: methods in which a level of chance is used to choose a sample

Simple random sampling: a larger scale version of picking names out of a hat.

The problem with simple random sampling is one of practicality.

Random SamplingRandom Sampling

The solution--used in modern opinion polling--is stratified sampling.

This method breaks the population down into strata (categories) and then randomly choose a sample from the strata.

The strata are then divided into substrata and the process is continued…

Example: Opinion Polls

Modern opinion polls construct their strata as follows:

1. The nation is divided into “size of community” strata.

2. These strata are divided by geographic location.

3. Communities in each geographic region are picked randomly.

4. Wards, precincts and households are then found randomly.

Example: Opinion Polls

The result is an efficient method that generally yields accurate results.

Usually, 1000-2000 people are needed for an accurate opinion poll. This does not depend on the size of the population being studied.

§ 13.1 - 13.3 Populations, § 13.1 - 13.3 Populations, Surveys and Random Surveys and Random

SamplingSamplingKent:  “Mr. Simpson, how do you

respond to the charges that petty vandalism such as graffiti is down eighty percent, while heavy sack-

beatings are up a shocking 900%?”

Homer:  “Aw, people can come up with statistics to prove anything, Kent.  Forty percent of all people

know that.”The SimpsonsThe Simpsons, `Homer the

Vigilante`

So… what is statistics, So… what is statistics, anyway?anyway?

The gathering, organizing, interpreting and understanding of data.

The PopulationThe Population

The complete set of individuals or objects about which we are seeking information is referred to as the population.

A silly example:Mayor Quimby takes a poll to find how many Springfielders plan to vote for him in the next election then the population would be the voters of Springfield.

The The N N - value- value

If it were possible to accurately count every member of a population we would get a number, N, called the N - value of the population.

The The N N - value (A Point or - value (A Point or Two)Two)

This value is often difficult to make--and therefore can require various adjustments.For instance, if a scientist were studying the effect of genetically modified corn on the monarch butterfly it would be practically impossible to accurately calculate the N - value.

The The N N - value (A Point or - value (A Point or Two)Two)

This value can change with time.

In the case of the monarch butterfly, the number of actual insects will obviously be different from year-to-year.

Article 1, Section 2:

[. . .]

Representatives and direct taxes shall be apportioned among the several states which may be included within this union, according to their respective numbers, which shall be determined by adding to the whole number of free persons, including those bound to service for a term of years, and excluding Indians not taxed, three fifths of all other Persons. The actual Enumeration shall be made within three years after the first meeting of the Congress of the United States, and within every subsequent term of ten years, in such manner as they shall by law direct.

Example: U.S. Census

Example: U.S. Census

What exactly is the census used for?

Determination of taxes and political representation.

Collects demographic information that is used to determine the allocation of federal money to state and local governments.

Used to calculate government statistics like the Consumer Price Index.

Example: U.S. Census

Why are we even talking about this?

In the 18th-Century the U.S. population was for the most part small, immobile and homogeneous.

Today = The Exact Opposite.

The result is that the modern census noticeably undercounts citizens. In fact, it differentially undercounts minorities and the poor.

There are statistical methods that could be used to correct some of the inaccuracy, but they cannot be used for determining Congressional apportionment by Supreme Court ruling.

SurveysSurveys

Since collecting information from large populations is so difficult, researchers instead gather data from selected subgroups and use that information to make inferences regarding the population as a whole.

This process is what math-and-science-y types refer to as a survey.

The selected subgroup is called a sample.

Surveys (cont’d)Surveys (cont’d)

There are two major issues when setting up a survey:

1. You need a sample that is a good representative of the population being studied.

2. You need a sample that is large enough to draw accurate information from, yet still small enough to be practical.

Example: Literary Digest and the 1936 Presidential

Election Literary Digest was a popular magazine that

had accurately predicted the winner in the five elections prior to 1936.

That year, the publication ambitiously decided to poll 10 million Americans.

The individuals contacted came from magazine subscription lists and telephone directory listings.

Example: Literary Digest and the 1936 Presidential

Election When the results came in 2.4 million

people had responded and the survey predicted that the vote would end Landon: 57% FDR: 43%

What actually happened?

The actual results wereFDR: 61% Landon: 36.5% Other: 2.5%(Landon, in fact, did not even carry his home state.)

George Gallup, however, made an accurate prediction with a sample of only 50,000 people.

Why were his results superior?

George Gallup, however, made an accurate prediction with a sample of only 50,000 people.

Why were his results superior? There are two main reasons:

1. The names were taken from phone directories and subscription lists--the people surveyed were disproportionately wealthy. When a survey has an inherent tendency to exclude a segment of the population being studied it is said to have selection bias.2. Out of 10 million people contacted only 24% replied.

This example of what is called nonresponse bias only magnified the first problem.

George Gallup, however, made an accurate prediction with a sample of only 50,000 people.

Why were his results superior? There are two main reasons:

1. The names were taken from phone directories and subscription lists--the people surveyed were disproportionately wealthy. When a survey has an inherent tendency to exclude a segment of the population being studied it is said to have selection bias.2. Out of 10 million people contacted only 24% replied.

This example of what is called nonresponse bias only magnified the first problem.

Example: “Ain’t the way I heard it.”

In 1935 Gallup had introduced a statistical method known as quota sampling. This technique was a systematic way of matching a sample to a designated profile.

With a sample of 3250 people specifically chosen as a cross-section of the country Gallup predicted a result for the 1948 election ofDewey: 49.5% Truman: 44.5%Thurmond, Wallace, etc: 6%

The actual result was. . .Truman: 49.9% Dewey: 44.5% Others: 5%

So, what went wrong this time?

The actual result was. . .Truman: 49.9% Dewey: 44.5% Others: 5%

So, what went wrong this time?

There are too many characteristics you could use for your quota.

The methods used in 1948 did not take economic status into account and oversampled Republican voters.

Most pollsters stopped gathering data when Dewey was coming in 13% ahead of Truman in some of the surveys.

The actual result was. . .Truman: 49.9% Dewey: 44.5% Others: 5%

So, what went wrong this time?

There are too many characteristics you could use for your quota.

The methods used in 1948 did not take economic status into account and oversampled Republican voters.

Most pollsters stopped gathering data when Dewey was coming in 13% ahead of Truman in some of the surveys.

Lessons to take from Lessons to take from these occurrences. . . these occurrences. . .

A small, well-chosen sample is better than a poorly-chosen large one.

Selection bias and nonresponse bias need to be taken into account.

Don’t stop surveying early. Quota sampling is flawed.

Random SamplingRandom Sampling

Random sampling: methods in which a level of chance is used to choose a sample

Simple random sampling: a larger scale version of picking names out of a hat.

The problem with simple random sampling is one of practicality.

Random SamplingRandom Sampling

The solution--used in modern opinion polling--is stratified sampling.

This method breaks the population down into strata (categories) and then randomly choose a sample from the strata.

The strata are then divided into substrata and the process is continued…

Example: Opinion Polls

Modern opinion polls construct their strata as follows:

1. The nation is divided into “size of community” strata.

2. These strata are divided by geographic location.

3. Communities in each geographic region are picked randomly.

4. Wards, precincts and households are then found randomly.

Example: Opinion Polls

The result is an efficient method that generally yields accurate results.

Usually, 1000-2000 people are needed for an accurate opinion poll. This does not depend on the size of the population being studied.

Slide 0

Recommended