Upload
ariel-clark
View
219
Download
0
Embed Size (px)
Citation preview
Statistics – OR 155Section 1
J. S. Marron, Professor
Department of Statistics
and Operations Research
Class Information
Handoutshttp://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassInfo/Stor155-09FirstHandout.pdf
With:
• Blackboard Info
• Student Survey
(please fill out & return after class)
Class Information
Go to Blackboard (for class details):
• Website: http://blackboard.unc.edu/
• Log-in with Onyen
• Choose this course
• Control Panel > Content Areas
• Course Information
• Choose Item “Course Information”
Relationship to Textbook
• Ordering of material in textbook is usual
• But I don’t like it
(poorly motivated)
• So will change the order of the material
(for better motivation)
• Will jump around a lot through the text
Reading In Textbook
Approximate Reading for Today’s Material:
Pages 1-5, 197-203, 203-208
Approximate Reading for Next Class:
Pages 237-250
What is Statistics?
Definition 1:
Gaining Insight from Numbers
(similar to text’s definition)
Definition 2:
The Science of Managing Uncertainty
What is Statistics?
Subtopics:
• Gathering the Numbers– E.g. Statistician at a ball game– Will see: how this is done is critical
• Forming Conclusions– Will use math, etc.– Major focus of this course
Key Themes
I. Uncertainty
II. Variability
(will get quantitative about these)
Favorite Quote:“I was never good at math, but statistics is
easy, since it is just common sense”
Motivating Examples
1. Political Polls– Try to predict outcome of election– Too expensive to ask everyone– So ask some (hope they are “representative”)
2. Measurement Error– No measurement is exact
– Can improve by multiple measurements– How to model?
Lessons of these are broadly applicable
Common Structure
For both, find out abouttruth from a sample
E.g. 1: % for Cand. in population
% for Cand. in sample
E.g. 2: true sizeobserved measurement
Motivating Examples
1. Political Polls2. Measurement Error
Will study each using mathematical models
Do E.g. 1 first, since easier
Appropriate Models?
Political Polls
Appropriate Mathematical Models?
Depends on how data are gathered.
See Text, pages 171-177
• Seems easy???
• “Just choose some”???
• Take a look at history…
How to sample?History of Presidential Election Polls
During Campaigns, constantly hear in news “polls say …” How good are these? Why?
How to sample?History of Presidential Election Polls
During Campaigns, constantly hear in news “polls say …” How good are these? Why?
1936 Landon vs. Roosevelt Literary Digest Poll: 43% for R
How to sample?History of Presidential Election Polls
During Campaigns, constantly hear in news “polls say …” How good are these? Why?
1936 Landon vs. Roosevelt Literary Digest Poll: 43% for R
Result: 62% for R
How to sample?History of Presidential Election Polls
During Campaigns, constantly hear in news “polls say …” How good are these? Why?
1936 Landon vs. Roosevelt Literary Digest Poll: 43% for R
Result: 62% for R
What happened?Sample size not big enough? 2.4 million
Biggest Poll ever done (before or since)
Bias in SamplingBias: Systematically favoring one outcome
(need to think carefully)
Selection Bias: Addresses from L. D.
readers, phone books, club memberships
(representative of population?)
Non-Response Bias: Return-mail survey
(who had time?)
How to sample?1936 Presidential Election (cont.)
Interesting Alternative Poll:
Gallup: 56% for R (sample size ~ 50,000)
Gallup of L.D. 44% for R ( ~ 3,000)
How to sample?1936 Presidential Election (cont.)
Interesting Alternative Poll:
Gallup: 56% for R (sample size ~ 50,000)
Gallup of L.D. 44% for R ( ~ 3,000)
Predicted both correct result (62% for R),
and L. D. error (43% for R)!
(how was improvement done?)
Improved Sampling
Gallup’s Improvements:
(i) Personal Interviews
(attacks non-response bias)
(ii) Quota Sampling
(attacks selection bias)
Quota SamplingIdea: make “sample like population”
So surveyor chooses people to give:i. Right % male
ii. Right % “young”
iii. Right % “blue collar”
iv. …
This worked fairly well (~5% error), until …
How to sample?1948 Dewey Truman sample size
How to sample?1948 Dewey Truman sample size
Crossley 50% 45%
Gallup 50% 44% ~50,000
Roper 53% 38% ~15,000
How to sample?1948 Dewey Truman sample size
Crossley 50% 45%
Gallup 50% 44% ~50,000
Roper 53% 38% ~15,000
Actual 45% 50% -
How to sample?1948 Dewey Truman sample size
Crossley 50% 45%
Gallup 50% 44% ~50,000
Roper 53% 38% ~15,000
Actual 45% 50% -
Note: Embarassing for polls, famous photo of Truman + Headline “Dewey Wins”
How to sample?Note: Embarassing for polls, famous photo
of Truman + Headline “Dewey Wins”
What went wrong?Problem: Unintentional Bias
(surveyors understood bias,
but still made choices)
What went wrong?Problem: Unintentional Bias
(surveyors understood bias,
but still made choices)
Lesson: Human Choice can not give a Representative Sample
What went wrong?Problem: Unintentional Bias
(surveyors understood bias,
but still made choices)
Lesson: Human Choice can not give a Representative Sample
Surprising Improvement: Random Sampling
Now called “scientific sampling”
Random = Scientific???
Random SamplingKey Idea: “random error” is smaller than
“unintentional bias”, for large enough sample sizes
Random SamplingKey Idea: “random error” is smaller than
“unintentional bias”, for large enough sample sizes
How large?
Current sample sizes: ~1,000 - 3,000
Random SamplingKey Idea: “random error” is smaller than
“unintentional bias”, for large enough sample sizes
How large?
Current sample sizes: ~1,000 - 3,000
Note: now << 50,000 used in 1948.
So surveys are much cheaper
(thus many more done now….)
Random Sampling
How Accurate?
• Can (& will) calculate using “probability”
• Justifies term “scientific sampling”
• 2nd improvement over quota sampling
Random SamplingWhat is random?
Simple Random Sampling:
Each member of population is
equally likely to be in sample
Key Idea: Different from “just choose some”
Random SamplingAn old (but still fun?) experiment:
Choose a number among 1,2,3,4
Random SamplingAn old (but still fun?) experiment:
Choose a number among 1,2,3,4
Old typical results: about 70% choose “3”
(perhaps you have seen this before…)
Random SamplingAn old (but still fun?) experiment:
Choose a number among 1,2,3,4
Old typical results: about 70% choose “3”
(perhaps you have seen this before…)
Main lesson: human choice does not give “equally likely” (i.e. random sample)
Random Sampling
How to choose a random sample?
Old Approaches:
– Random Number Table
– Roll Dice
Modern Approach:
– Computer Generated
Random Sampling HWInteresting Question:
What is the % of Male Students at UNC?
(Your chance of date,
or take 100% - to get your chance)
HW:
C1: Class Handouthttp://stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/HWAsst/Stor155HWC1.pdf
Random Sampling HWNotes on HW C1:• 3 dumb ways to sample, 1 good one• Goal is to learn about sampling,
Not “get right answer”• Part 1, put symbol for yourself, Ms and Fs
for others• Put both count & % (%100 x count / 25)• Part 2, “tally” is:• Part 4, student phone directory available
in Student Union?
Random Sampling HWNotes on HW C1,
• Hints on Part 4:– For each draw, first draw a “random page”– Tools Data Analysis Random Number
Generation Uniform is one way to do this– In “Uniform”, you need to set “Parameters”, to
0 and “number of pages”– This gives a random decimal, to get an
integer, round up, using CEILING– In CEILING, set “significance” to 1
Random Sampling HWNotes on HW C1,
• Hints on Part 4 (cont.):– Next Choose Random Column– Next Choose Random Name– Caution: Different numbers on each page.– Challenge: still make equally likely– Approach: choose larger number– Approach: when not there, just toss it out– Approach: then do a “redraw”– Also redraw if can’t tell gender