37
What is statistics? • Statistics is the science of dealing with data. • Data is any type of info packaged in numerical form. • Common examples: Political polls, Health/medical studies

What is statistics?

  • Upload
    malia

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

What is statistics?. Statistics is the science of dealing with data. Data is any type of info packaged in numerical form. Common examples: Political polls, Health/medical studies. Some Basic Definitions. Population: collection of individuals or objects we want to study statistically - PowerPoint PPT Presentation

Citation preview

Page 1: What is statistics?

What is statistics?

• Statistics is the science of dealing with data.

• Data is any type of info packaged in numerical form.

• Common examples: Political polls, Health/medical studies

Page 2: What is statistics?

Some Basic Definitions

• Population: collection of individuals or objects we want to study statistically

• “What is the population to which the statistical statement applies?”

• N-value: how many individuals/objects there are in the population

Page 3: What is statistics?

Example

• Study: What percentage of the M&Ms in the jar are blue?

• Population: all of the M&Ms in the jar

• N-value: 4392

Page 4: What is statistics?

Census• Census: the process of collecting data by going

through every member of the population

• Our example: Count all M&Ms in the jar, count all of the blue ones, find percentage.

• Drawbacks:– Expensive– Too much work– Almost impossible for large populations

Page 5: What is statistics?

Census vs Survey• Census: the process of collecting data by going

through every member of the population

• Survey: process of collecting data only from some members of the population (and use that data to draw conclusions & make inferences about the entire population)

• Poll: data collection done by asking questions

Page 6: What is statistics?

Use samples!

• Sample: a subgroup of the population chosen to provide the data

• Sampling: the act of selecting a sample– Finding a good sample is EXTREMELY DIFFICULT!!!!

• Sampling frame: the actual subset of the population from which the sample will be drawn

Page 7: What is statistics?

Example

• Study: What percentage of our class likes cheeseburgers?

• Population: all members of our class• N-value: 20• Sampling frame: all of the women in our class• A Sample: all of the women in our class who

are present today

Page 8: What is statistics?

Sampling frames make a difference!• CNN/USA Today/ Gallup Poll, Nov 2004: If the election for Congress were being held

today, which party’s candidate would you vote for in your district?

• Asked of 1866 registered voters nationwide: 49% for Dem, 47% for Rep, 4% undecided

• Asked of 1573 likely voters nationwide: 50% for Rep, 46% for Dem, 3% undecided

Page 9: What is statistics?

Representative Samples

• When a population is highly homogeneous, a very small sample may be representative– Ex: blood samples, thoroughly mixed cake batter,

etc

• More heterogeneous populations -> more difficult to find representative samples

Page 10: What is statistics?

Are these samples representative?• Question: What is the average time it takes a UNL

student to walk to class?

• Samples:– All students living in dorms– All students who use city buses– All students in the Union at noon– All students currently taking math classes

Page 11: What is statistics?

1936 Literary Digest Poll• US presidential election: Alfred Landon (R) vs incumbent Franklin D Roosevelt (D)

• Sampling frame included:– Every person listed in a telephone directory

anywhere in the US– Every person on a magazine subscription list– Every person listed on the roster of a club or

professional association– List of 10 million people created to whom mock

ballots were mailed

Page 12: What is statistics?

1936 Literary Digest Poll• Poll predicted Landon with 57% of vote vs

Roosevelt’s 43%

• Reality: 62% for Roosevelt and 38% for Landon

• What went wrong?!– Think about the sample. – Representative?– Biased?

Page 13: What is statistics?

Bias• Selection bias: when the choice of the sample

has a built-in tendency to exclude a particular group or characteristic within the population

• Literary Digest poll only had 24% response rate

• Low response rate -> nonresponse bias (selection bias)

Page 14: What is statistics?

Lots of different kinds of bias• Leading-question bias:– Are you in favor of paying higher taxes to bail the

federal government out of its disastrous economic policies and its mismanagement of the federal budget?

• Question order bias

• Afraid to answer bias:– Have you ever cheated on your income taxes?

Page 15: What is statistics?

Morals

• Bigger samples aren’t necessarily better samples!

• Watch out for different types of bias!

• A representative sample is key!

Page 16: What is statistics?

Lots of Sampling Methods• Convenience sampling: selection of individuals

included in the sample is dictated by what is easiest or cheapest– Notoriously bad!– Ex: Want to know the average score on the last

quiz? Sample: Look at the scores of the people sitting next to you.

– Ex: Want to know how people feel about making the switch to the Big Ten? Sample: Set up a table outside of your house for people to come by and fill out questionnaire

Page 17: What is statistics?

Quota sampling

• Quota sampling: the sample should have so many women, so many men, so many Christians, so many Muslims, so many urban-dwellers, so many rural farmers, etc

• The proportions in each category in the sample should be the same as those in the population

Page 18: What is statistics?

Example of quota sampling• Intro to Stats has 120 students– 40 freshman– 30 sophomores– 30 juniors– 20 seniors

• To fill out questionnaire, prof selects– 24 freshman– 18 sophomores– 18 juniors– 12 seniors

Page 19: What is statistics?

1948 US Presidential Election

• Gallup poll used detailed quota sampling

• Sample size: 3250 people

• Prediction vs reality: – Thomas Dewey: 49.5% / 44.5%– Harry Truman: 44.5% / 49.9%

• What went wrong?

Page 20: What is statistics?

Simple Random Sampling• SRS: all members of the population have an

equal chance at being included in the sample

• How were previous examples not SRS?

• Examples of methods:– Pull names from a hat– Flip a coin– Random number generator

Page 21: What is statistics?

Stratified Sampling• Break the sampling frame into categories

(strata), then randomly choose a sample from these strata

• Those chosen strata are subdivided into substrata, and a random sample taken.

• Subdivide again and take a random sample, etc

• End up with clusters, but usually reliable

Page 22: What is statistics?

Stratified Sampling Example

Page 23: What is statistics?
Page 24: What is statistics?
Page 25: What is statistics?

Now survey these houses!

Page 26: What is statistics?

More Definitions

• Statistic: Numerical information drawn from a sample

• Parameter: unknown measure (numerical info) from the population

• Hopefully, the statistic will be close to the parameter so conclusions made about the sample will be true for the whole population.

Page 27: What is statistics?

Error and Bias

• Sampling error: the difference between the parameter (estimated) and the statistic

• Sampling error attributed to:– Chance error– Sampling variability: different samples give

different results– Sampling bias: bad sample chosen

Page 28: What is statistics?

Sample Size

• Population size = N

• Sample size = n

• Sampling proportion = n/N

• Modern public opinion polls: 1000 ≤ n ≤ 1500

Page 29: What is statistics?

Capture-Recapture

• Used to estimate the N-value

• Steps:– Choose a sample of size , tag the members,

and release.– After some time, capture a new sample of size and take an exact head count of tagged

individuals. Call that number k. – The N-value is approximately

Page 30: What is statistics?

Small fish in a big pond

• A pond of fish!

• Capture = 200 fish. Tag them.

• Capture = 150 fish. Notice that k = 21 of these fish have tags.

• There are approximately N ≈ (200*150)/21 ≈ 1428 fish

Page 31: What is statistics?

Clinical Studies

• Try to study cause and effect, whereas surveys just observe and report

CORRELATION DOES NOT IMPLY CAUSATION!!!!!!!!!!

Page 32: What is statistics?

Alar Scare• Alar: chemical used by apple growers

• 1973: mice exposed to active chemicals in Alar at 8 times greater than the max tolerated dosage– A child would have to eat 200,000 apples per day

to get that dosage

• Alar doesn’t really cause cancer, but no longer used. Washington State apple industry lost $375 million.

Page 33: What is statistics?

Clinical studies

• Concerned with determining whether a single variable or treatment (vaccine, drug, therapy, etc) can cause a certain effect (disease, symptom, cure, etc)

• Confounding variables: all other possible contributing causes that could produce the same effect

• First step: isolate the treatment under investigation from confounding variables

Page 34: What is statistics?

Controlled Study

• Subjects are divided into two different groups:– Treatment group: consists of subjects receiving

the actual treatment– Control group: consists of subjects that are not

receiving any treatment (for comparison only)

• Randomized controlled study: subjects are assigned to the treatment group or control group randomly....hopefully groups are representative samples

Page 35: What is statistics?

Placebos

• Placebo: fake treatment intended to look like the real treatment

• Controlled placebo study: controlled study in which control group is given a placebo

• Placebo effect: just the idea of getting treatment can produce positive results

Page 36: What is statistics?

Don’t tell them about the placebo!

• Blind study: neither the members of the treatment group nor the members of the control group know to which of the two groups they belong

• Double-blind study: the scientists conducting the study don’t know either

Page 37: What is statistics?

Homework

• Read Chapter 13• Answer the questions on the Vocabulary

worksheet• Exercises beginning on page 515: 1-4, 13, 17-25, 30-32, 45-48, 57-60, 70