6
84 Teaching Statistics. Volume 28, Number 3, Autumn 2006 © 2006 The Author Journal compilation © 2006 Teaching Statistics Trust Blackwell Publishing Ltd Oxford, UK TEST Teaching Statistics 0141-982X © Teaching Statistics Trust 2006 Autumn 2006 28 3 Original Articles The Royal Statistical Society Schools Lecture 2004: ‘Lies and Statistics’, Part 2 KEYWORDS: Teaching; Misinterpretations of statistics; Monty Hall; Coincidences; National Lottery. Frank Duckworth Stinchcombe, Gloucestershire, England. e-mail: [email protected] Summary This article concludes the serialization of the Royal Statistical Society’s Schools Lecture for 2004, on ‘Lies and statistics’. The transcript of the lecture, which commenced in the last edition of the journal, is here continued to its conclusion. ª TAKE YOUR PICK ª I n Part 1, I ended by illustrating the importance of having all the relevant information available when being asked to make a decision. I’m now going to give another example that makes the same point but not so obviously. It’s the ‘three-box problem’, sometimes known as the ‘Monty Hall problem’, possibly because it occurred in an American TV game show in which a man of this name was the host. [Many authors have written about the Monty Hall problem; see, for example, Eisenhauer 2000 in this journal.] The game is a simplified version of ‘Take your pick’, which is a game show that was on our televi- sion about 40 years ago, where competitors had to answer some fairly simple general knowledge questions and if they got them right then they were invited to go over to a stack of 10 boxes, pick one of these and take whatever prize was in this box. Many boxes had very valuable prizes, but what made the game exciting was that some of the boxes contained booby prizes which had no value at all. In this version of the game, there are just three boxes, which we shall call Box A, Box B and Box C. One of these contains a fantastically valuable prize, like a diamond bracelet or a voucher for a brand-new Mercedes car. But the other two con- tain booby prizes, like a piece of coal or a lettuce; for simplicity let’s just say that these other two boxes are empty. The host asks the contestant to make just one choice of box and whatever is in that box will be his prize. So you can see quite clearly that he’s got one chance in three of winning the prize. Let us say he selects Box A. The host then picks up the key to Box A and walks towards it to unlock it and reveal the con- tents. But just before he does so, he stops and turns to the contestant, and he says ‘You’ve picked Box A. Now I’m not going to tell you whether Box A contains the prize or whether it’s empty. But what I am going to tell you is that Box B does not contain the prize’ – and to prove it he goes over to Box B and opens it and it is indeed empty. He then offers the contestant the chance of changing his mind. To switch or not to switch; that is the question. Should he stick with Box A or should he change to Box C? [There are always a few who are already acquainted with the problem and they invariably declare that he should switch to Box C.] This is the dilemma. There are two apparently quite equal ways of looking at the problem and they give different answers. One way is to argue that all the host has done is to change the game from a three-box game to a two-box game, so it’s just as if we were starting from scratch with only two boxes to choose from, and so the chances are one in two whether the prize is in either box; so it’s immaterial whether he changes or not. The other way of looking at it is to note that when he made his choice the chances were one in three that he was right and the prize was in Box A and two in three that he was wrong and the prize was in one

Francis Duckworth - Lies and Statistics

Embed Size (px)

DESCRIPTION

Francis Duckworth of Duckworth and Lewis method fame in cricket has written an elaborate exposition of the various statistical fallacies involved in our day to day reasoning.

Citation preview

  • 84

    Teaching Statistics. Volume 28, Number 3, Autumn 2006

    2006 The AuthorJournal compilation 2006 Teaching Statistics Trust

    Blackwell Publishing LtdOxford, UKTESTTeaching Statistics0141-982X Teaching Statistics Trust 2006Autumn 2006283Original Articles

    The Royal Statistical Society Schools Lecture 2004:

    Lies and Statistics, Part 2

    KEYWORDS:

    Teaching; Misinterpretations of statistics; Monty Hall; Coincidences; National Lottery.

    Frank Duckworth

    Stinchcombe, Gloucestershire, England. e-mail: [email protected]

    Summary

    This article concludes the serialization of the RoyalStatistical Societys Schools Lecture for 2004, onLies and statistics.

    The transcript of the lecture, which commenced inthe last edition of the journal, is here continued toits conclusion.

    TAKE YOUR PICK

    I

    n Part 1, I ended by illustrating the importanceof having all the relevant information available

    when being asked to make a decision. Im nowgoing to give another example that makes thesame point but not so obviously.

    Its the three-box problem, sometimes known asthe Monty Hall problem, possibly because itoccurred in an American TV game show in whicha man of this name was the host. [Many authorshave written about the Monty Hall problem; see,for example, Eisenhauer 2000 in this journal.]

    The game is a simplified version of Take yourpick, which is a game show that was on our televi-sion about 40 years ago, where competitors hadto answer some fairly simple general knowledgequestions and if they got them right then they wereinvited to go over to a stack of 10 boxes, pick oneof these and take whatever prize was in this box.Many boxes had very valuable prizes, but whatmade the game exciting was that some of the boxescontained booby prizes which had no value at all.In this version of the game, there are just threeboxes, which we shall call Box A, Box B and BoxC. One of these contains a fantastically valuableprize, like a diamond bracelet or a voucher for abrand-new Mercedes car. But the other two con-tain booby prizes, like a piece of coal or a lettuce;for simplicity lets just say that these other twoboxes are empty. The host asks the contestant tomake just one choice of box and whatever is in

    that box will be his prize. So you can see quiteclearly that hes got one chance in three of winningthe prize. Let us say he selects Box A.

    The host then picks up the key to Box A andwalks towards it to unlock it and reveal the con-tents. But just before he does so, he stops andturns to the contestant, and he says Youvepicked Box A. Now Im not going to tell youwhether Box A contains the prize or whether itsempty. But what I

    am

    going to tell you is that BoxB does

    not

    contain the prize and to prove it hegoes over to Box B and opens it and it is indeedempty. He then offers the contestant the chance ofchanging his mind.

    To switch or not to switch; that is the question.Should he stick with Box A or should he changeto Box C? [There are always a few who are alreadyacquainted with the problem and they invariablydeclare that he should switch to Box C.]

    This is the dilemma. There are two apparentlyquite equal ways of looking at the problem andthey give different answers. One way is to arguethat all the host has done is to change the gamefrom a three-box game to a two-box game, so itsjust as if we were starting from scratch with onlytwo boxes to choose from, and so the chances areone in two whether the prize is in either box; so itsimmaterial whether he changes or not. The otherway of looking at it is to note that when he madehis choice the chances were one in three that hewas right and the prize was in Box A and two inthree that he was wrong and the prize was in one

  • 2006 The Author

    Teaching Statistics. Volume 28, Number 3, Autumn 2006

    85

    Journal compilation 2006 Teaching Statistics Trust

    of the others, thats Box B or Box C. What the kindhost has done is to eliminate the possibility that theprize is in Box B, so all the two in three probabilityof not being in Box A is now concentrated in Box C;in other words, theres now a two in three chanceof it being in Box C, and so he should switch.

    When this problem appeared in

    Racing Times

    there was much correspondence, and in all thiscorrespondence it was the switchers who won theday. Professor Ian Stewart of the University ofWarwick, in his Royal Institution Christmas lec-tures to young people a few years ago, also camedown on the side of switching, and I heard that heeven gave switching as the answer to Sue Lawleywhen he appeared on

    Desert Island Discs

    shortlyafterwards. [At this point I ask if anyone has readMark Haddons excellent book

    The Curious Inci-dent of the Dog in the Night Time

    in which a wholechapter is devoted to explaining why switchingwas the correct option.]

    But Ian Stewart was wrong. Mark Haddon waswrong.

    Everyone

    was wrong. Those who saidswitch were wrong; those who said dont switchwere wrong.

    The truth is that a mathematician or statisticiancannot help you. The problem isnt properlydefined because you arent given the informationyou need to solve it.

    The big unknown is this: What had the host inmind when he made his switching offer? What washis strategy? He

    knew

    which box contained theprize and which boxes didnt. So, when the con-testant selected Box A, did the host suddenlydecide that he would confuse matters by openingan empty box, or had he been told by his producerto reveal Box B anyway?

    Suppose the latter. Let us suppose that the hosthad decided in advance that whichever box thecontestant picked, he would reveal the contents ofBox B, which he knew to be empty. If the contest-ant selected Box B, he would open it and say Imsorry; its empty. But if one of the other two, A orC, were selected, hed reveal Box B to be emptyand offer the opportunity for a switch. If

    this

    wasthe hosts strategy, then hes added no new infor-mation to favour C over A and so its immaterialwhether the contestant should switch or not.

    But now suppose the host had decided that

    whatever choice were made

    , hed select an

    empty

    box

    and open it gratuitously before offering thecontestant the chance to change. (Whether theoriginal selection were right or wrong, there mustalways be at least one empty box among the twonot selected.) In this case, he has actually providedadditional information, and he has increased thechances in the eyes of the contestant of the prizebeing in the other box. So in this situation heshould switch.

    But there are two other strategies the host mayhave had. He may have been nice or he may havebeen nasty. Suppose he was a nice host. Supposehe really

    wanted

    the man to win and had decidedthat if the contestant selected correctly he wouldsay nothing but if he got it wrong he would openan empty box and give him the chance of chang-ing. If

    that

    were the situation, then the contestantshould definitely switch.

    But suppose he was a nasty host and didnt wantthe man to win. In that case he would only openthe empty box and allow him to change if theoriginal selection were

    correct

    . So if

    that

    were thehosts strategy, he should definitely

    not

    change.

    So there are four possible strategies. The correctdecision depends on which was correct and thiswe just do not know. The way the problem waspresented, there is no mathematical solution.It was

    not

    a mathematical problem; it was apsychological one.

    WISE AFTER THE EVENT

    Another apparent source of lies from statisticscomes from sifting through lots of statistics andfinding something of note and then using prob-abilities to prove that they are meaningful. Agood example of this is a letter to

    The Times

    in1963. Someone had gone through the daily rainfallstatistics since the end of the Second WorldWar and shown that the wettest day of the weekwas Thursday, the day of Thor, god of thunder,whereas the driest day of the week was Sunday,the day of the Sun.

    The writer claimed that his observation wasstatistically significant because the chances againstboth Thursday being the wettest day and Sundaybeing the driest were only 1 in 42 whereas anyprobability less than 1 in 20 is regarded as beingstatistically significant. Then in 1995 there was

  • 86

    Teaching Statistics. Volume 28, Number 3, Autumn 2006

    2006 The AuthorJournal compilation 2006 Teaching Statistics Trust

    another letter to

    The Times

    again making the samepoint. We now had figures for about 50 years, andthese again showed that Thursday was the wettestday and Sunday the driest.

    But the mistake this later letter writer made wasincluding the figures up to 1963, which had givenrise to the original letter. In fact, if he had justlooked at the figures between 1963 and 1993,he would have found that Thursday had nowdropped to the fourth wettest day and Sunday wasno longer the driest day.

    What you

    cannot

    do is to note some apparentlysignificant set of statistics and then ask a statisti-cian to tell you the probability. When an event hashappened, there is no meaning at all to the prob-ability of it happening. The probability of some-thing happening that has already happened is100%. Of course, if youd asked the question

    before

    the event happened that would be different,but when youve made your observation and spot-ted something that you think might have some realsignificance you cannot use the data that gave riseto the suggestion to test its significance; you mustwipe the slate clean and start to collect completelynew data.

    WHAT A COINCIDENCE!

    Everything that happens in this world is trulyremarkable when you look at the combination ofunlikely events that have had to occur to make ithappen. The fact that each and every one of us hascome into the world at all is really quite remarkable.

    With the benefit of hindsight, one can find someremarkable coincidences, much more so thanThursday being wet and Sunday dry. Consider, forexample, the case of Mrs Evelyn Adams of NewJersey who won the state lottery jackpot twicewithin the space of a year. A reporter telephoneda professor of statistics at the local university(Rutgers University) and asked him some ques-tions about probability. From the answers hereported that Mrs Adams had defied odds of17,400 billion to one!

    What this professor had actually been asked towork out was the chance of one prespecified com-bination of numbers coming up in two prespecifieddraws; for each draw the odds were 4.2 million toone against. So for two draws you multiply that

    number by itself and get 17 million million. Thatwas, of course, completely the wrong calculation.

    First of all, Mrs Adams wasnt buying one ticket;it seems she was buying about 100 tickets everyweek. Then we must ignore the fact that she wonit the first time;

    someone

    had to win, and for thesake of the probability calculation, it doesnt mat-ter to us whether it was Mrs Adams or Mrs Smithor Mr Jones. The question to ask is What is theprobability that someone somewhere, who buys100 tickets a week, will win their state lotteryjackpot twice in the space of say a year? If you dothe sums, you can quite easily show that it wasmore likely than not that eventually this eventwould happen.

    You must never pick on something that hashappened and ask a statistician to tell you theprobability. Well you can do it, but you cantmake any use at all of the answer. You cant useit, for example, to say that supernatural forcesmust have been at work or that there has beensome trickery afoot.

    Ive heard many cases of amazing coincidences,and I expect some of you could produce your ownexamples. For instance, when I was a student inLiverpool in the 1960s I lived in lodgings wherethere were 12 other students. We all went off forthe summer vacation and when we came backwe heard the remarkable story that two of themhad bumped into one another on a bus in SaltLake City. And at Berkeley (Gloucestershire, UK)where I worked, I heard of a case where two ofthe staff met accidentally in the elevator goingdown to visit the Hoover Dam; and of anothercase where two met on a beach in Menorca, andanother where two met in a remote fishing villagein Portugal. There are many instances of remark-able coincidences and Ive classified them intothree types.

    Type 1 is where the actual probability of occur-rence is deceptively much higher than ones instinctdictates. A good example of this is the familiarbirthday puzzle where most people cannot imaginethat you only need to have 23 in a room for thereto be a 50/50 chance that two will share a birthday.[Time permitting, I pick off about 20 studentsand ask them to tell me what they think are thechances of two of them having the same birthday on one occasion this backfired on me, as twogirls in the group isolated revealed that they weretwins!]

  • 2006 The Author

    Teaching Statistics. Volume 28, Number 3, Autumn 2006

    87

    Journal compilation 2006 Teaching Statistics Trust

    A Type 2 coincidence is where it is apparentlyamazing but it has selected itself in retrospect fromthe vast number of things that happen forinstance, the chance encounters referred to above,although even these have a bit of Type 1 in themas the people were of a similar type and hadsimilar lifestyles.

    And that leaves what I call a Type 3 coincidence.This is where it really is totally amazing and onefeels it necessary to invoke the existence of super-natural forces or perhaps deliberate trickery. Ifany of you know of something you think might bea Type 3 coincidence, please let me know.

    PLAYING WITH CHANCE

    Youve probably already guessed that Im

    not

    agambler; because in the long run youre bound tolose. [I then show a picture of the strip at LasVegas and invite students to identify it, whichmany do!] I have been a total of four times to thatincredible city and I can honestly admit that theonly money I have ever lost there was in the stampmachine! Let me give you some very good advice never gamble in casinos; they make an awfullot of money and it all comes from the people whogamble.

    But there is one very common form of gamblingthat I am sure most of you indulge in, or will doso eventually, which is the National Lottery andI know that you are all waiting for me to tell youhow to win [the synopsis of the talk that hasbeen circulated beforehand mentions that I willexplain how to increase ones expected lotterywinnings]. Well if you want sound financialadvice, youll employ my strategy and never buy asingle ticket.

    But it is only because of my convictions as a logi-cal mathematician that I never buy one. I save 2a week, but I forgo that extraordinary pleasure of

    not

    winning several million pounds twice a week.It is quite rational to buy just one ticket regularlyalthough it is not strictly logical; but it is

    not

    rational if you spend large sums every week. Inthe long run, and it has to be the very long runindeed, long enough for you to win the jackpot atleast once, you will get back 45% of what youstaked, which is slightly better than the footballpools where you get back about a third of yourstake.

    So accepting that I do not criticize anyone fordoing the lottery with the minimum stake, whatadvice can I give you as a statistician?

    Let us get one thing quite clear. There is nothingthat I or anyone else can do to help you to increaseyour chances of picking the right numbers. So takeno notice at all of any claims to help you pick thewinning numbers. But there

    is

    something I can dofor you. I cant help you to win, but I

    can

    help youto win more

    if

    you win.

    The key to this lies in peoples selection of randomnumbers. If everyone who bought a ticket pickedtheir six numbers from the numbers 1 to 49 com-pletely at random, then there would be about twoor three jackpot winners every draw, the numberof winners would hardly ever exceed 15 and thereshould have been no more than about 40 roll-overs since the lottery started about 10 years ago[when there are no jackpot winners, the prizemoney rolls over to the next draw]. In fact, therehave been over 130 roll-overs, there have beenseveral instances of more than 20 winners, and onone occasion during the lotterys first year therewere no fewer than 133 winners.

    What we find is that in most draws there are eithermany fewer winners or many more winners thanwould be expected if every selection were madepurely at random. So most people who win a sharein the jackpot are having to share it with manyothers. The reason is that people in general are notvery good at picking random numbers. To showyou what I mean, lets carry out a little experi-ment. [In practice this experiment is carried out asan introduction to the talk, and the slips of paperare kept hidden until this point.]

    THINK OF A NUMBER

    What I want you to do is to pick some randomnumbers as follows. [A slip of paper containingwhat is described below was handed to everyoneas they came in.]

    Item 1. There are four numbers written in asquare: 1, 2, 3 and 4. Will you please select one ofthese numbers at random, 1, 2, 3 or 4, and put across over the number you have selected.

    Item 2. Below this you have an empty square andI want you to pick a single-digit figure at random,

  • 88

    Teaching Statistics. Volume 28, Number 3, Autumn 2006

    2006 The AuthorJournal compilation 2006 Teaching Statistics Trust

    either 1 2 3 4 5 6 7 8 9 or 0, and write it in thissquare.

    Item 3. I want you to make a random NationalLottery selection of six different numbers between1 and 49; put them in order and write them in thespaces provided.

    Let us now look at the results of the experimentand see how good

    you

    are at picking randomnumbers. And I am going to be rather bold andguess that youre

    not

    very good.

    For item 1 you had four numbers to pick from, soon average a quarter of you should have pickedeach number. Im going to speculate that manymore than a quarter of you have picked thenumber 3. [I ask for hands raised if so, and invar-iably nearly one-half of the audience have selectedthe number 3.]

    For item 2 you had 10 numbers to pick from soabout a tenth of you should have picked eachnumber. Raise your hand if youve picked thenumber 7 [a great many more than one-tenth,nearer one-third, pick the number 7].

    And of course to have picked both 3 and 7 at ran-dom should happen once in 40 times. How manyof you here have picked both 3 for the first itemand 7 for the second [about one in 10 have pickedthese two numbers].

    Let us now look at your lottery selections. Willyou all please make sure that your numbers are inascending order. Now look through your six num-bers and see if any two of them are consecutive. Ifyou have at least one instance of two (or more)consecutive numbers, please raise you hand [manymore hands down than hands raised].

    As I expected, far less than half of you havechosen six numbers where two are consecutive.What I have to tell you is that with truly randomselections there will be two consecutive numbersin very nearly

    one-half

    of the draws. Indeed, fromthe 845 draws since the lottery started in Novem-ber 1994 until the end of January 2004, there wereconsecutive numbers in 402 of them, which isabout 48% of the draws. Examination of the num-bers of winners tells us that when there are consec-utive numbers there are fewer people sharing thejackpot. So there is the first piece of advice I cangive you; make sure you have two consecutivenumbers.

    What you want to do to make sure you win a lot,

    if

    you win, is to pick numbers or combinations ofnumbers that few others pick. Camelot, who runsthe UK National Lottery, does not reveal anyinformation on the numbers people select, so wecannot look for the most popular numbers orcombinations of numbers. But there are otheridentical lotteries in the world, notably in Switzer-land and Canada, and these do make the details ofselections available, and they tell us some veryinteresting things. First of all, the most popularnumber is 7 [many more than the expected one-eighth raise their hands rather guiltily on beingasked]; next comes 11 [same again]. Odd numbersare more popular than even, and fewer thanexpected pick numbers over 40.

    There are also interesting results from lookingat combinations of numbers. Did anyone herepick the numbers 1, 2, 3, 4, 5 and 6? [Almostalways there are two or three who admit to it.]What I have to tell you is that if you had pickedthese numbers and they had come up, the indica-tions are that you would probably have had toshare your jackpot winnings with about 10,000others!

    So here is my advice. First of all, you must pickyour six numbers at random. The best way ofdoing this is to measure out a 7-inch by 7-inchsquare, rule this into 49 squares, write the numbers1 to 49 in the squares, cut them out separately andfold each one twice. Then put them in a hat, mixthem thoroughly and draw out six numbers. Buthere is the trick. Put them in ascending order andexamine them. To accept them they must satisfyeach of the following four criteria:

    1. there must be at least one instance of two ormore consecutive numbers

    2. there must

    not

    be a 7 or an 113. there must be no more than three odd numbers4. there must be at least one number of 41 or

    over

    Otherwise, put all six numbers back, mix themthoroughly again, and redraw; and keep doing thisuntil all four conditions are satisfied.

    This will not make any difference to your chancesof winning. You are still facing odds of nearly 14million to one against, but in the very unlikelyevent that you do win, at least you probablywont have to share your winnings with very manyothers.

  • 2006 The Author

    Teaching Statistics. Volume 28, Number 3, Autumn 2006

    89

    Journal compilation 2006 Teaching Statistics Trust

    [Readers of this journal may also care to note thearticle by Helman (2004).]

    [Before sitting down, I explain how they might liketo consider a career in statistics and how the RoyalStatistical Society can provide help and advice. Andfinally I thank them for being a lively and cooper-ative audience, which was almost invariably the case.]

    References

    Eisenhauer, J.G. (2000). The Monty Hallmatrix.

    Teaching Statistics

    ,

    22

    (1), 1720.Helman, D. (2004). In all probability, prob-

    ability is not all.

    Teaching Statistics

    ,

    26

    (1),2628.