49
§ 13.1 - 13.3 § 13.1 - 13.3 Populations, Populations, Surveys and Random Surveys and Random Sampling Sampling Kent: “Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is down eighty percent, while heavy sack-beatings are up a shocking 900%?” Homer: “Aw, people can come up with statistics to prove anything, Kent. Forty percent of all people know that.” The Simpsons The Simpsons, `Homer the Vigilante`

§ 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

§ 13.1 - 13.3 Populations, § 13.1 - 13.3 Populations, Surveys and Random Surveys and Random

SamplingSamplingKent:  “Mr. Simpson, how do you

respond to the charges that petty vandalism such as graffiti is down eighty percent, while heavy sack-

beatings are up a shocking 900%?”

Homer:  “Aw, people can come up with statistics to prove anything, Kent.  Forty percent of all people

know that.”The SimpsonsThe Simpsons, `Homer the

Vigilante`

Page 2: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

So… what is statistics, So… what is statistics, anyway?anyway?

The gathering, organizing, interpreting and understanding of data.

Page 3: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

The PopulationThe Population

The complete set of individuals or objects about which we are seeking information is referred to as the population.

A silly example:Mayor Quimby takes a poll to find how many Springfielders plan to vote for him in the next election then the population would be the voters of Springfield.

Page 4: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

The The N N - value- value

If it were possible to accurately count every member of a population we would get a number, N, called the N - value of the population.

Page 5: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

The The N N - value (A Point or - value (A Point or Two)Two)

This value is often difficult to make--and therefore can require various adjustments.For instance, if a scientist were studying the effect of genetically modified corn on the monarch butterfly it would be practically impossible to accurately calculate the N - value.

Page 6: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

The The N N - value (A Point or - value (A Point or Two)Two)

This value can change with time.

In the case of the monarch butterfly, the number of actual insects will obviously be different from year-to-year.

Page 7: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Article 1, Section 2:

[. . .]

Representatives and direct taxes shall be apportioned among the several states which may be included within this union, according to their respective numbers, which shall be determined by adding to the whole number of free persons, including those bound to service for a term of years, and excluding Indians not taxed, three fifths of all other Persons. The actual Enumeration shall be made within three years after the first meeting of the Congress of the United States, and within every subsequent term of ten years, in such manner as they shall by law direct.

Example: U.S. Census

Page 8: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Example: U.S. Census

What exactly is the census used for?

Determination of taxes and political representation.

Collects demographic information that is used to determine the allocation of federal money to state and local governments.

Used to calculate government statistics like the Consumer Price Index.

Page 9: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Example: U.S. Census

Why are we even talking about this?

In the 18th-Century the U.S. population was for the most part small, immobile and homogeneous.

Today = The Exact Opposite.

The result is that the modern census noticeably undercounts citizens. In fact, it differentially undercounts minorities and the poor.

There are statistical methods that could be used to correct some of the inaccuracy, but they cannot be used for determining Congressional apportionment by Supreme Court ruling.

Page 10: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

SurveysSurveys

Since collecting information from large populations is so difficult, researchers instead gather data from selected subgroups and use that information to make inferences regarding the population as a whole.

This process is what math-and-science-y types refer to as a survey.

The selected subgroup is called a sample.

Page 11: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Surveys (cont’d)Surveys (cont’d)

There are two major issues when setting up a survey:

1. You need a sample that is a good representative of the population being studied.

2. You need a sample that is large enough to draw accurate information from, yet still small enough to be practical.

Page 12: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Example: Literary Digest and the 1936 Presidential

Election Literary Digest was a popular magazine that

had accurately predicted the winner in the five elections prior to 1936.

That year, the publication ambitiously decided to poll 10 million Americans.

The individuals contacted came from magazine subscription lists and telephone directory listings.

Page 13: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Example: Literary Digest and the 1936 Presidential

Election When the results came in 2.4 million

people had responded and the survey predicted that the vote would end Landon: 57% FDR: 43%

What actually happened?

Page 14: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

The actual results wereFDR: 61% Landon: 36.5% Other: 2.5%(Landon, in fact, did not even carry his home state.)

Page 15: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

George Gallup, however, made an accurate prediction with a sample of only 50,000 people.

Why were his results superior?

Page 16: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

George Gallup, however, made an accurate prediction with a sample of only 50,000 people.

Why were his results superior? There are two main reasons:

1. The names were taken from phone directories and subscription lists--the people surveyed were disproportionately wealthy. When a survey has an inherent tendency to exclude a segment of the population being studied it is said to have selection bias.2. Out of 10 million people contacted only 24% replied.

This example of what is called nonresponse bias only magnified the first problem.

George Gallup, however, made an accurate prediction with a sample of only 50,000 people.

Why were his results superior? There are two main reasons:

1. The names were taken from phone directories and subscription lists--the people surveyed were disproportionately wealthy. When a survey has an inherent tendency to exclude a segment of the population being studied it is said to have selection bias.2. Out of 10 million people contacted only 24% replied.

This example of what is called nonresponse bias only magnified the first problem.

Page 17: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Example: “Ain’t the way I heard it.”

In 1935 Gallup had introduced a statistical method known as quota sampling. This technique was a systematic way of matching a sample to a designated profile.

With a sample of 3250 people specifically chosen as a cross-section of the country Gallup predicted a result for the 1948 election ofDewey: 49.5% Truman: 44.5%Thurmond, Wallace, etc: 6%

Page 18: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

The actual result was. . .Truman: 49.9% Dewey: 44.5% Others: 5%

So, what went wrong this time?

Page 19: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

The actual result was. . .Truman: 49.9% Dewey: 44.5% Others: 5%

So, what went wrong this time?

There are too many characteristics you could use for your quota.

The methods used in 1948 did not take economic status into account and oversampled Republican voters.

Most pollsters stopped gathering data when Dewey was coming in 13% ahead of Truman in some of the surveys.

The actual result was. . .Truman: 49.9% Dewey: 44.5% Others: 5%

So, what went wrong this time?

There are too many characteristics you could use for your quota.

The methods used in 1948 did not take economic status into account and oversampled Republican voters.

Most pollsters stopped gathering data when Dewey was coming in 13% ahead of Truman in some of the surveys.

Page 20: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Lessons to take from Lessons to take from these occurrences. . . these occurrences. . .

A small, well-chosen sample is better than a poorly-chosen large one.

Selection bias and nonresponse bias need to be taken into account.

Don’t stop surveying early. Quota sampling is flawed.

Page 21: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Random SamplingRandom Sampling

Random sampling: methods in which a level of chance is used to choose a sample

Simple random sampling: a larger scale version of picking names out of a hat.

The problem with simple random sampling is one of practicality.

Page 22: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Random SamplingRandom Sampling

The solution--used in modern opinion polling--is stratified sampling.

This method breaks the population down into strata (categories) and then randomly choose a sample from the strata.

The strata are then divided into substrata and the process is continued…

Page 23: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Example: Opinion Polls

Modern opinion polls construct their strata as follows:

1. The nation is divided into “size of community” strata.

2. These strata are divided by geographic location.

3. Communities in each geographic region are picked randomly.

4. Wards, precincts and households are then found randomly.

Page 24: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Example: Opinion Polls

The result is an efficient method that generally yields accurate results.

Usually, 1000-2000 people are needed for an accurate opinion poll. This does not depend on the size of the population being studied.

Page 25: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

§ 13.1 - 13.3 Populations, § 13.1 - 13.3 Populations, Surveys and Random Surveys and Random

SamplingSamplingKent:  “Mr. Simpson, how do you

respond to the charges that petty vandalism such as graffiti is down eighty percent, while heavy sack-

beatings are up a shocking 900%?”

Homer:  “Aw, people can come up with statistics to prove anything, Kent.  Forty percent of all people

know that.”The SimpsonsThe Simpsons, `Homer the

Vigilante`

Page 26: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

So… what is statistics, So… what is statistics, anyway?anyway?

The gathering, organizing, interpreting and understanding of data.

Page 27: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

The PopulationThe Population

The complete set of individuals or objects about which we are seeking information is referred to as the population.

A silly example:Mayor Quimby takes a poll to find how many Springfielders plan to vote for him in the next election then the population would be the voters of Springfield.

Page 28: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

The The N N - value- value

If it were possible to accurately count every member of a population we would get a number, N, called the N - value of the population.

Page 29: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

The The N N - value (A Point or - value (A Point or Two)Two)

This value is often difficult to make--and therefore can require various adjustments.For instance, if a scientist were studying the effect of genetically modified corn on the monarch butterfly it would be practically impossible to accurately calculate the N - value.

Page 30: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

The The N N - value (A Point or - value (A Point or Two)Two)

This value can change with time.

In the case of the monarch butterfly, the number of actual insects will obviously be different from year-to-year.

Page 31: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Article 1, Section 2:

[. . .]

Representatives and direct taxes shall be apportioned among the several states which may be included within this union, according to their respective numbers, which shall be determined by adding to the whole number of free persons, including those bound to service for a term of years, and excluding Indians not taxed, three fifths of all other Persons. The actual Enumeration shall be made within three years after the first meeting of the Congress of the United States, and within every subsequent term of ten years, in such manner as they shall by law direct.

Example: U.S. Census

Page 32: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Example: U.S. Census

What exactly is the census used for?

Determination of taxes and political representation.

Collects demographic information that is used to determine the allocation of federal money to state and local governments.

Used to calculate government statistics like the Consumer Price Index.

Page 33: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Example: U.S. Census

Why are we even talking about this?

In the 18th-Century the U.S. population was for the most part small, immobile and homogeneous.

Today = The Exact Opposite.

The result is that the modern census noticeably undercounts citizens. In fact, it differentially undercounts minorities and the poor.

There are statistical methods that could be used to correct some of the inaccuracy, but they cannot be used for determining Congressional apportionment by Supreme Court ruling.

Page 34: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

SurveysSurveys

Since collecting information from large populations is so difficult, researchers instead gather data from selected subgroups and use that information to make inferences regarding the population as a whole.

This process is what math-and-science-y types refer to as a survey.

The selected subgroup is called a sample.

Page 35: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Surveys (cont’d)Surveys (cont’d)

There are two major issues when setting up a survey:

1. You need a sample that is a good representative of the population being studied.

2. You need a sample that is large enough to draw accurate information from, yet still small enough to be practical.

Page 36: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Example: Literary Digest and the 1936 Presidential

Election Literary Digest was a popular magazine that

had accurately predicted the winner in the five elections prior to 1936.

That year, the publication ambitiously decided to poll 10 million Americans.

The individuals contacted came from magazine subscription lists and telephone directory listings.

Page 37: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Example: Literary Digest and the 1936 Presidential

Election When the results came in 2.4 million

people had responded and the survey predicted that the vote would end Landon: 57% FDR: 43%

What actually happened?

Page 38: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

The actual results wereFDR: 61% Landon: 36.5% Other: 2.5%(Landon, in fact, did not even carry his home state.)

Page 39: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

George Gallup, however, made an accurate prediction with a sample of only 50,000 people.

Why were his results superior?

Page 40: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

George Gallup, however, made an accurate prediction with a sample of only 50,000 people.

Why were his results superior? There are two main reasons:

1. The names were taken from phone directories and subscription lists--the people surveyed were disproportionately wealthy. When a survey has an inherent tendency to exclude a segment of the population being studied it is said to have selection bias.2. Out of 10 million people contacted only 24% replied.

This example of what is called nonresponse bias only magnified the first problem.

George Gallup, however, made an accurate prediction with a sample of only 50,000 people.

Why were his results superior? There are two main reasons:

1. The names were taken from phone directories and subscription lists--the people surveyed were disproportionately wealthy. When a survey has an inherent tendency to exclude a segment of the population being studied it is said to have selection bias.2. Out of 10 million people contacted only 24% replied.

This example of what is called nonresponse bias only magnified the first problem.

Page 41: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Example: “Ain’t the way I heard it.”

In 1935 Gallup had introduced a statistical method known as quota sampling. This technique was a systematic way of matching a sample to a designated profile.

With a sample of 3250 people specifically chosen as a cross-section of the country Gallup predicted a result for the 1948 election ofDewey: 49.5% Truman: 44.5%Thurmond, Wallace, etc: 6%

Page 42: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

The actual result was. . .Truman: 49.9% Dewey: 44.5% Others: 5%

So, what went wrong this time?

Page 43: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

The actual result was. . .Truman: 49.9% Dewey: 44.5% Others: 5%

So, what went wrong this time?

There are too many characteristics you could use for your quota.

The methods used in 1948 did not take economic status into account and oversampled Republican voters.

Most pollsters stopped gathering data when Dewey was coming in 13% ahead of Truman in some of the surveys.

The actual result was. . .Truman: 49.9% Dewey: 44.5% Others: 5%

So, what went wrong this time?

There are too many characteristics you could use for your quota.

The methods used in 1948 did not take economic status into account and oversampled Republican voters.

Most pollsters stopped gathering data when Dewey was coming in 13% ahead of Truman in some of the surveys.

Page 44: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Lessons to take from Lessons to take from these occurrences. . . these occurrences. . .

A small, well-chosen sample is better than a poorly-chosen large one.

Selection bias and nonresponse bias need to be taken into account.

Don’t stop surveying early. Quota sampling is flawed.

Page 45: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Random SamplingRandom Sampling

Random sampling: methods in which a level of chance is used to choose a sample

Simple random sampling: a larger scale version of picking names out of a hat.

The problem with simple random sampling is one of practicality.

Page 46: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Random SamplingRandom Sampling

The solution--used in modern opinion polling--is stratified sampling.

This method breaks the population down into strata (categories) and then randomly choose a sample from the strata.

The strata are then divided into substrata and the process is continued…

Page 47: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Example: Opinion Polls

Modern opinion polls construct their strata as follows:

1. The nation is divided into “size of community” strata.

2. These strata are divided by geographic location.

3. Communities in each geographic region are picked randomly.

4. Wards, precincts and households are then found randomly.

Page 48: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Example: Opinion Polls

The result is an efficient method that generally yields accurate results.

Usually, 1000-2000 people are needed for an accurate opinion poll. This does not depend on the size of the population being studied.

Page 49: § 13.1 - 13.3 Populations, Surveys and Random Sampling Kent: “ Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is

Slide 0