17
1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar) of a set of observations, add their values and divide by the number of observations. If the n observations are x 1 , x 2 ,…, x n , their mean is: Or Actually, the notation refers to the mean of a sample. Most of the time, the data we’ll encounter can be thought of as a sample from some larger population. When we need to refer to a population mean, we’ll use the symbol μ (Greek letter mu, pronounced “mew”). If you have the entire population of data available, then you calculate μ in just the way you’d expect: add the values of all the observations, and divide by the number of observations. Example – Travel Times to Work in North Carolina Calculating the mean Below is data on travel times of 15 North Carolina residents. 1) Find the mean travel time for all 15 workers 2) Calculate the mean again, this time excluding the person who reported a 60-minute travel time to work. What do you notice?

1.3.1 Measuring Center: The Mean - The arithmetic average

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1.3.1 Measuring Center: The Mean - The arithmetic average

1.3.1MeasuringCenter:TheMeanMean-Thearithmeticaverage.Tofindthemean (pronouncedxbar)ofasetofobservations,addtheirvaluesanddividebythenumberofobservations.Ifthenobservationsarex1,x2,…,xn,theirmeanis:

Or

Actually,thenotation referstothemeanofasample.Mostofthetime,thedatawe’llencountercanbethoughtofasasamplefromsomelargerpopulation.Whenweneedtorefertoapopulationmean,we’llusethesymbolμ(Greeklettermu,pronounced“mew”).Ifyouhavetheentirepopulationofdataavailable,thenyoucalculateμinjustthewayyou’dexpect:addthevaluesofalltheobservations,anddividebythenumberofobservations.Example–TravelTimestoWorkinNorthCarolinaCalculatingthemeanBelowisdataontraveltimesof15NorthCarolinaresidents.1)Findthemeantraveltimeforall15workers2)Calculatethemeanagain,thistimeexcludingthepersonwhoreporteda60-minutetraveltimetowork.Whatdoyounotice?

Page 2: 1.3.1 Measuring Center: The Mean - The arithmetic average

Thepreviousexampleillustratesanimportantweaknessofthemeanasameasureofcenter:themeanissensitivetotheinfluenceofextremeobservations.Thesemaybeoutliers,butaskeweddistributionthathasnooutlierswillalsopullthemeantowarditslongtail.Becausethemeancannotresisttheinfluenceofextremeobservations,wesaythatitisnotaresistantmeasureofcenter.ResistantMeasure-Astatisticthatisnotaffectedverymuchbyextremeobservations.

Page 3: 1.3.1 Measuring Center: The Mean - The arithmetic average

1.3.2MeasuringCenter:TheMedianMedian-ThemedianMisthemidpointofadistribution,thenumbersuchthathalftheobservationsaresmallerandtheotherhalfarelarger.Tofindthemedianofadistribution:

1. Arrangeallobservationsinorderofsize,fromsmallesttolargest.2. Ifthenumberofobservationsnisodd,themedianMisthecenterobservationintheordered

list.3. Ifthenumberofobservationsniseven,themedianMistheaverageofthetwocenter

observationsintheorderedlist.Example–TravelTimestoWorkinNorthCarolinaFindingthemedianwhennisoddWhatisthemediantraveltimeforour15NorthCarolinaworkers?Herearethedataarrangedinorder:

51010101012152020253030404060

Thecountofobservationsn=15isodd.Thebold20isthecenterobservationintheorderedlist,with7observationstoitsleftand7toitsright.Thisisthemedian,M=20minutes.

Page 4: 1.3.1 Measuring Center: The Mean - The arithmetic average

Example–StuckinTrafficFindingthemedianwhennisevenPeoplesaythatittakesalongtimetogettoworkinNewYorkStateduetotheheavytrafficnearbigcities.Whatdothedatasay?Herearethetraveltimesinminutesof20randomlychosenNewYorkworkers:

103052540201015302015208515651560604045

1.Makeastemplotofthedata.Besuretoincludeakey.2.Findaninterpretthemedian.

Page 5: 1.3.1 Measuring Center: The Mean - The arithmetic average

1.3.3ComparingtheMeanandtheMedianOurdiscussionoftraveltimestoworkinNorthCarolinaillustratesanimportantdifferencebetweenthemeanandthemedian.Themediantraveltime(themidpointofthedistribution)is20minutes.Themeantraveltimeishigher,22.5minutes.Themeanispulledtowardtherighttailofthisright-skeweddistribution.Themedian,unlikethemean,isresistant.Ifthelongesttraveltimewere600minutesratherthan60minutes,themeanwouldincreasetomorethan58minutesbutthemedianwouldnotchangeatall.Theoutlierjustcountsasoneobservationabovethecenter,nomatterhowfarabovethecenteritlies.Themeanusestheactualvalueofeachobservationandsowillchaseasinglelargeobservationupward.Themeanandmedianofaroughlysymmetricdistributionareclosetogether.Ifthedistributionisexactlysymmetric,themeanandmedianareexactlythesame.Inaskeweddistribution,themeanisusuallyfartheroutinthelongtailthanisthemedian.LeftSkewedDistributions RightSkewedDistribution

Page 6: 1.3.1 Measuring Center: The Mean - The arithmetic average

CheckYourUnderstandingQuestions1through4refertothefollowingsetting.Here,onceagain,isthestemplotoftraveltimestoworkfor20randomlyselectedNewYorkers.Earlier,wefoundthatthemedianwas22.5minutes.1.Basedonlyonthestemplot,wouldyouexpectthemeantraveltimetobelessthan,aboutthesameas,orlargerthanthemedian?Why? 2.Useyourcalculatortofindthemeantraveltime.WasyouranswertoQuestion1correct? 3.InterpretyourresultfromQuestion2incontextwithoutusingthewords“mean”or“average.”4.Wouldthemeanorthemedianbeamoreappropriatesummaryofthecenterofthisdistributionofdrivetimes?Justifyyouranswer.

Page 7: 1.3.1 Measuring Center: The Mean - The arithmetic average

1.3.4MeasuringSpread:TheInterquartileRange(IQR)Ausefulnumericaldescriptionofadistributionrequiresbothameasureofcenterandameasureofspread.HowtoCalculateQuartilesQ1|M|Q31.ArrangetheobservationsinincreasingorderandlocatethemedianMintheorderedlistofobservations.2.ThefirstquartileQ1isthemedianoftheobservationswhosepositionintheorderedlististotheleftofthemedian.3.ThethirdquartileQ3isthemedianoftheobservationswhosepositionintheorderedlististotherightofthemedian.InterquartileRange–IQR=Q3-Q1Example–TravelTimestoWorkinNorthCarolinaCalculatingquartilesOurNorthCarolinasampleof15workers’traveltimes,arrangedinincreasingorder,isThereisanoddnumberofobservations,sothemedianisthemiddleone,thebold20inthelist.Thefirstquartileisthemedianofthe7observationstotheleftofthemedian.Thisisthe4thofthese7observations,soQ1=10minutes(showninblue).Thethirdquartileisthemedianofthe7observationstotherightofthemedian,Q3=30minutes(showningreen).Sothespreadofthemiddle50%ofthetraveltimesisIQR=Q3−Q1=30−10=20minutes.BesuretoleaveouttheoverallmedianMwhenyoulocatethequartiles.

Page 8: 1.3.1 Measuring Center: The Mean - The arithmetic average

ThequartilesandtheinterquartilerangeareresistantbecausetheyarenotaffectedbyafewextremeobservationsExample–StuckinTrafficAgainFindingandinterpretingtheIQRFindandinterprettheinterquartilerange(IQR).

Page 9: 1.3.1 Measuring Center: The Mean - The arithmetic average

1.3.5IdentifyingOutliersInadditiontoservingasameasureofspread,theinterquartilerange(IQR)isusedaspartofaruleofthumbforidentifyingoutliers.1.5*IQR–Callanobservationanoutlierifitfallsmorethan1.5xIQRabovethethirdquartileorbelowthefirstquartileExample–TravelTimestoworkinNewYorkIdentifyingOutliersusingthe1.5*IQRruleIdentifyanyoutliersinthedatafromthestemplot.Q1=15minutesQ3=42.5minutesIQR=27.5minutesExample–TravelTimestoWorkinNorthCarolinaIdentifyingOutliersDetermineifthetraveltimeof60minutesinthesampleof15NorthCarolinaworkersisanoutlier.Q1=10minutesQ3=30minutesIQR=20minutes

Page 10: 1.3.1 Measuring Center: The Mean - The arithmetic average

1.3.6TheFive-NumberSummaryandBoxplotsFive-NumberSummary–Consistsofthesmallestobservation,thefirstquartile,themedian,thethirdquartile,andthelargestobservation,writteninorderfromsmallesttolargest.Insymbols,thefive-numbersummaryis

MinimumQ1MQ3Maximum

Thesefivenumbersdivideeachdistributionroughlyintoquarters.About25%ofthedatavaluesfallbetweentheminimumandQ1,about25%arebetweenQ1andthemedian,about25%arebetweenthemedianandQ3,andabout25%arebetweenQ3andthemaximum.Thefive-numbersummaryofadistributionleadstoanewgraph,theboxplot(akaboxandwhiskerplot).HowtoMakeaBoxplot1.Acentralboxisdrawnfromthefirstquartile(Q1)tothethirdquartile(Q3).2.Alineintheboxmarksthemedian.3.Lines(calledwhiskers)extendfromtheboxouttothesmallestandlargestobservationsthatarenotoutliers.Example–HomeRunKingMakingaBoxplotBarryBondssetthemajorleaguerecordbyhitting73homerunsinasingleseasonin2001.OnAugust7,2007,Bondshithis756thcareerhomerun,whichbrokeHankAaron’slongstandingrecordof755.Bytheendofthe2007seasonwhenBondsretired,hehadincreasedthetotalto762.HerearedataonthenumberofhomerunsthatBondshitineachofhis21completeseasons:

162524193325344637334240373449734645452628

Makeaboxplotfortheabovedata,theinitialstepshavebeendonetosaveyoutime.

Page 11: 1.3.1 Measuring Center: The Mean - The arithmetic average

CheckYourUnderstandingThe2009rosteroftheDallasCowboysprofessionalfootballteamincluded10offensivelinemen.Theirweights(inpounds)were

338318353313318326307317311311

1.Findthefive-numbersummaryforthesedatabyhand.Showyourwork.2.CalculatetheIQR.Interpretthisvalueincontext.3.Determinewhetherthereareanyoutliersusingthe1.5×IQRrule.4.Drawaboxplotofthedata.

Page 12: 1.3.1 Measuring Center: The Mean - The arithmetic average

1.3.7MeasuringSpread:TheStandardDeviationThefive-numbersummaryisnotthemostcommonnumericaldescriptionofadistribution.Thatdistinctionbelongstothecombinationofthemeantomeasurecenterandthestandarddeviationtomeasurespread.Thestandarddeviationanditscloserelative,thevariance,measurespreadbylookingathowfartheobservationsarefromtheirmean.Let’sexplorethisideausingasimplesetofdata.Example–HowManyPets?InvestigatingspreadaroundthemeanBelowlistsdatadetailingthenumberofpetsownedby9children.

134445789

Themeannumberofpetsis5.Let’slookatwheretheobservationsinthedatasetarerelativetothemean.Thefigureabovedisplaysthedatainadotplot,withthemeanclearlymarked.Thedatavalue1is4unitsbelowthemean.Wesaythatitsdeviationfromthemeanis−4.Whataboutthedatavalue7?Itsdeviationis7−5=2(itis2unitsabovethemean).Thearrowsinthefiguremarkthesetwodeviationsfromthemean.Thedeviationsshowhowmuchthedatavaryabouttheirmean.Theyarethestartingpointforcalculatingthevarianceandstandarddeviation.

Thetabletotheleftshowsthedeviationfromthe

mean foreachvalueinthedataset.Sumthedeviationsfromthemean.Youshouldget0,becausethemeanisthebalancepointofthedistribution.Sincethesumofthedeviationsfromthemeanwillbe0foranysetofdata,weneedanotherwaytocalculatespreadaroundthemean.Howcanwefixtheproblemofthepositiveandnegativedeviationscancelingout?Wecouldtaketheabsolutevalueofeachdeviation.Orwecouldsquarethedeviations.Formathematicalreasonsbeyondthescopeofthisbook,statisticianschoosetosquareratherthantouseabsolutevalues.

Page 13: 1.3.1 Measuring Center: The Mean - The arithmetic average

Wehaveaddedacolumntothetablethatshowsthe

squareofeachdeviation .Addupthesquareddeviations.Didyouget52?Nowwecomputetheaveragesquareddeviation—sortof.Insteadofdividingbythenumberofobservationsn,wedividebyn−1:

Thevalue6.5iscalledthevariance.

Variance- The average squared distance of the observations in a data set from their mean.

In symbols, Becausewesquaredallthedeviations,ourunitsarein“squaredpets.”That’snogood.We’lltakethesquareroottogetbacktothecorrectunits—pets.Theresultingvalueisthestandarddeviation:

This2.55isroughlytheaveragedistanceofthevaluesinthedatasetfromthemean.StandardDeviation-Thestandarddeviationsxmeasurestheaveragedistanceoftheobservationsfromtheirmean.Itiscalculatedbyfindinganaverageofthesquareddistancesandthentakingthe

squareroot.Thisaveragesquareddistanceiscalledthevariance.Insymbols,thevariance isgivenbyHowtoFindtheStandardDeviation

1. Findthedistanceofeachobservationfromthemeanandsquareeachofthesedistances.2. Averagethedistancesbydividingtheirsumbyn−1.3. Thestandarddeviationsxisthesquarerootofthisaveragesquareddistance:

Page 14: 1.3.1 Measuring Center: The Mean - The arithmetic average

Manycalculatorsreporttwostandarddeviations,givingyouachoiceofdividingbynorbyn−1.Theformerisusuallylabeledσx,thesymbolforthestandarddeviationofapopulation.Ifyourdatasetconsistsoftheentirepopulation,thenit’sappropriatetouseσx.Moreoften,thedatawe’reexaminingcomefromasample.Inthatcase,weshouldusesx.Moreimportantthanthedetailsofcalculatingsxarethepropertiesthatdeterminetheusefulnessofthestandarddeviation:

• sxmeasuresspreadaboutthemeanandshouldbeusedonlywhenthemeanischosenasthemeasureofcenter.

• sxisalwaysgreaterthanorequalto0.sx=0onlywhenthereisnovariability.Thishappensonlywhenallobservationshavethesamevalue.Otherwise,sx>0.Astheobservationsbecomemorespreadoutabouttheirmean,sxgetslarger.

• sxhasthesameunitsofmeasurementastheoriginalobservations.Forexample,ifyoumeasuremetabolicratesincalories,boththemeanXandthestandarddeviationsxarealsoin

calories.Thisisonereasontoprefersxtothevariance ,whichisinsquaredcalories.• LikethemeanX,sxisnotresistant.Afewoutlierscanmakesxverylarge.

TheuseofsquareddeviationsmakessxevenmoresensitivethanXtoafewextremeobservations.

Page 15: 1.3.1 Measuring Center: The Mean - The arithmetic average

CheckYourUnderstandingTheheights(ininches)ofthefivestartersonabasketballteamare67,72,76,76,and84.1.Findandinterpretthemean.2.Makeatablethatshows,foreachvalue,itsdeviationfromthemeananditssquareddeviationfromthemean.3.Showhowtocalculatethevarianceandstandarddeviationfromthevaluesinyourtable.4.Interpretthemeaningofthestandarddeviationinthissetting.

Page 16: 1.3.1 Measuring Center: The Mean - The arithmetic average

1.3.9ChoosingMeasureofCenterandSpreadWenowhaveachoicebetweentwodescriptionsofthecenterandspreadofadistribution:themedianandIQR,orXandsx.BecauseXandsxaresensitivetoextremeobservations,theycanbemisleadingwhenadistributionisstronglyskewedorhasoutliers.Inthesecases,themedianandIQR,whicharebothresistanttoextremevalues,provideabettersummary.We’llseeinthenextchapterthatthemeanandstandarddeviationarethenaturalmeasuresofcenterandspreadforaveryimportantclassofsymmetricdistributions,theNormaldistributions.ChoosingMeasuresofCenterandSpreadThemedianandIQRareusuallybetterthanthemeanandstandarddeviationfordescribingaskeweddistributionoradistributionwithstrongoutliers.UseXandsxonlyforreasonablysymmetricdistributionsthatdon’thaveoutliers.Rememberthatagraphgivesthebestoverallpictureofadistribution.Numericalmeasuresofcenterandspreadreportspecificfactsaboutadistribution,buttheydonotdescribeitsentireshape.Numericalsummariesdonothighlightthepresenceofmultiplepeaksorclusters,forexample.Alwaysplotyourdata.

Page 17: 1.3.1 Measuring Center: The Mean - The arithmetic average

Example-WhoTextsMore—MalesorFemales?PullingitalltogetherFortheirfinalproject,agroupofAPStatisticsstudentsinvestigatedtheirbeliefthatfemalestextmorethanmales.Theyaskedarandomsampleofstudentsfromtheirschooltorecordthenumberoftextmessagessentandreceivedoveratwo-dayperiod.Herearetheirdata: