Upload
sgwcollins
View
40
Download
1
Embed Size (px)
Citation preview
Think:Bing It On!
Compares Bing to Google
How would you design this?Tell me:
Me?And I’m guessing:
Hypothesis: Students in Toronto do not prefer one SE to another.
How?100 Senecans will be surveyed by 10 paid
surveyors.Asked to compare two frames with fonts, colours and text sizes randomized.Search terms Senecans choose.Choose frame they like best: Google or BingResults not revealed to participants
Why?Identify sample and population I’m
trying to sample.Removing my bias by asking surveyorsSurveyors will not know how survey is
designed.“Double blind”
Why 100?10 is too few1000 is too many.For sufficiently large n, the distribution of will be closely approximated by a normal distribution with the same mean and
variance.[1] Using this approximation, it can be shown that around 95% of this distribution's probability lies within 2 standard deviations of the mean. Because of this, an interval of the form
will form a 95% confidence interval for the true proportion. If this interval needs to be no more than W units wide, the equation
can be solved for n, yielding[2][3] n = 4/W2 = 1/B2 where B is the error bound on the estimate, i.e., the estimate is usually given as within ± B. So, for B = 10% one requires n = 100, for B = 5% one needs n = 400, for B = 3% the requirement approximates to n = 1000, while for B = 1% a sample size of n = 10000 is required. These numbers are quoted often in news reports of opinion polls and other sample surveys.
“Sample Size Determination”
Say that works60 prefer Bing40 prefer Google
What does that mean?
I have no idea!Well, sort of.
60% (±5%, p=.05) prefer Bing to Google
You tell me, what does that mean?
Maybe nothing?Maybe something?
Look: that was as easy as it gets!
Population identification, sample size calculation, double blinding, within two standard deviations, after stripping CSS—all that before I do the statistics
Which I can’t understand!
Good methodology● Design your experiment before hand● Run the experiment according to design● Without peeking
– Or changing● Collect all data● Interpret all data● Make all data available● Analyze data according to good analysis principles.
DucklingsYou have no idea how to do this.
No idea.Neither do I.
QuestionsHow many people do you need to
survey?How do you test them?Double blind?Blind?What do you ask them?
You have to do this● It’s too easy to fool yourself
Let’s reviewPublish or perish?
Who perishes? And where do they publish?
JournalsWhat are the most prestigious
journals in the world?How do you know?
Impact factorNature
Proceedings of the National Academy of Science
Science
Physical Review Letters
Journal of the American Chemistry Society
Physics Review B
Journal of Biological Chemistry
Applied Physics Letters
New England Journal of Medicine
Cell
(Eigenfactor.org data for 2011, most recent available)
RoughlyNumber of in-citationsNumber of out-citations
But?Top-ranked are mostly medicine w.
some physicsNo computers in top 100
Bioinformatics: 68
Get publishedOr get fired.
Science, Nature, Cell, NEJM, JAMA
You get ‘tenure’—never fired, made for life.
● Japanese researcher in anaesthesiology– Worked in Canada too
● Published 212 papers in 20 years(about one a month)
(Hmmmmm).
Yoshitaka Fujii
You’ll never guessHe made them up.
● 172 are demonstrably false.
As an aside:● Retractions still need work:
– Of Fujii’s first ten articles on GS ● 4 was clearly retracted● 1 was less clearly retracted● 5 were not labelled as retracted
Jan Hendrik Schön● Nano-physics genius!
– Won $100,000 as best young scientist
– Published, at his best, one paper every eight days● Including in Science and Nature
–The very best journals in the world.
Now● He has 10 friends on Facebook.
– I’m one!Gave back his PhD.Disappeared
You’ll never guess● He made all of his data up.
– [Movie time! 35:00]
So?● What’s the problem?
● So they lied. Nobody died.● (Well, probably. Fujii was a doctor.)
As I see it● Money
– Millions of dollars● Reputation
– Bell Labs, universities, colleagues, students
● Work: Reid Chesterfield spent 5 years trying to replicate Schön’s work.
MohammadHis supervisor spent months trying to
replicate Schön’s work(That’s hundreds of thousands of dollars)
Another kind● Damages to the scientific enterprise:
– Science has to be open to catch cheaters
– But openess makes researchers look bad
Kinds of fraud● Fabrication● Falsification● Other
Fraud“Fabrication of data involves totally inventing a
data set, falsification refers to manipulation of equipment or changing data such that the research is not accurately represented in the research report.” (Stroebe, Pestmes and Spears)
Fabrication● Pretty clear—you make up the data.
Falsification● Changing or interpreting the data:
“There is no rigid mathematical definition of what constitutes an outlier; determining whether or not an observation is an outlier is ultimately a subjective exercise.”
Outliers● How do you deal with them?
– Bill Gates walks in the room● Median and mean income?
(How) Do you eliminate that variation?
Data picking
● Say you want to show that monkeys flip a coin to heads more often than humans. How do you do it?
● Not investigate. Show
● 1) Each flip 1 coin 100 times● 2) Each flip 10 coins 10 times● 3) Each flip 100 coins 1 time
Then...
● Re-design your experiment!
Then...
● Monkeys and humans each flipped 10 coins....
A ha.● This is (abuse) of methodology
– And why I keep saying it matters!
Google Scholar vs MAS
What does that tell you?
Google Scholar vs MAS● That GS has better searching than
MAS● Or that GS has worse searching than
MAS!
Check this out
0 10 20 30 40 50 600
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Column A
Linear (Column A)
ClearlyA strong trend:
Decreasing over xDespite what appear to be sinusoidal variations
One problem:Made the data with random numbers
– And a few tricks● No R value● Lighten points● Darken line● Compress y for sharpness● Regenerate data if necessary
AlsoChoose line of best fit:
Linear? Moving average? Exponential? Log?
Of course● That’s not nearly the only way!
– Repeat the whole experiment– Blinding– Survey design– Outlier elimination
● And so on.
So: It’s easyIt’s so, so easy to cheat!
Let’s do it:
Google vs BingSay you wanted to show that Bing >
Google.How would you?
Population is, er, everyone!
Sample 1000 in Seattle
Sample young white men in Seattle
Redo sample!
Remove double blind
Remove single blind
10 in a row for Google? Outlier!
Choose best 100 of 1000 in Seattle
Repeat that ‘experiment’ to find the 20 th out of 20.
Why?● Career pressure
– Publish or perish– Past glories
● Over confidence● Tempation because of irreproducibility
How do they get caught?● Data that is too good● Draw suspicion in publication● Ratted out by underlings
Lessons:● Don’t cheat well● Don’t cheat much