Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
More descrip+ve sta+s+cs:
Percen+les, boxplots, and z-scores
Outline for today
Be#erknowaplayerWadeBoggsReview:• Centraltendencyandmeasuresofvaria
Be>er a player
WadeBoggsAnyques
Descrip+ve sta+s+cs
Whatisasta$s$c?Asta$s$cisanumericalsummary(func
The mean
Mean=x1+x2+x3+…+xn= Σxin n
Samplemean(x̅)vs.popula
The median
Themedianisthevalueinthemiddleofyourdata• ½ofthevaluesaregreaterthanthemedianand½areless
Themedianisresistanttooutlierswhenthemeanisnot
The standard devia+on
Mean
Mean±stdev
Large vs small standard devia+ons
Samemean,differentstandarddevia
The 95% rule (of thumb)
Ifadistribu
Percen+les
Thepthpercen$leisthevalueofaquan
Percen+les/quan+les
h#ps://emeyers.shinyapps.io/baseball_stat_percen
What is a good sta+s+c for…?
Usethewebsitetodeterminewhat“good”valuesareforthefollowingsta
PuOng sta+s+cs in context
90thpercen
PuOng sta+s+cs in context
90thpercen
Calcula+ng percen+les
Thepthpercen$leisthevalueofaquan
Calcula+ng percen+les
Thepthpercen$leisthevalueofaquan
Calcula+ng percen+les
Thepthpercen$leisthevalueofaquan
Order 1 2 3 4 5 6 7 8 9 10 11Sorted data 23 23 28 29 30 32 35 35 37 38 54Percen+le 0 10 20 30 40 50 60 70 80 90 100
Calcula+ng percen+les
Typicallyweaskforavaluethatisatthepthpercen
Five Number Summary
FiveNumberSummary=(min,Q1,median,Q3,max)Q1=25thpercen
Range and Interquar+le Range
Range=maximum–minimum
Interquar$lerange(IQR)=Q3–Q1
R: IQR(x)
Compute: 5 number summary, range, and IQR fro David Or+z home runs
1.FiveNumberSummary=(min,Q1,median,Q3,max)2.Range=maximum–minimum3.Interquar$lerange(IQR)=Q3–Q1
Alsousethepercen
5 number summary, range, and IQR fro David Or+z home runs
1.FiveNumberSummary:(23,28.5,32,36,54)2.Range:313.Interquar$lerange(IQR)=7.5
The5numbersummaryforHRsforallplayer-seasonwithover500PAis:(0,4,10,20,73)
54 35 23 28 3229 23 30 35 3738
Detec+ng of outliers
Asaruleofthumb,wecalladatavalueanoutlierifitis:
Smallerthan:Q1-1.5*IQR
Largerthan:Q3+1.5*IQRArethereanyoutliersinDavidOr$zhomerunnumbers?
1.FiveNumberSummary:(23,28.5,32,36,54)2.Range:313.Interquar
Boxplots
Aboxplotisagraphicaldisplayofthe5numbersummaryandconsistsof:
1.DrawingaboxfromQ1toQ3
2.Dividingtheboxwithalinedrawnatthemedian
3.Drawalinefromeachquar
Box plot of David Or+z home runs
R: boxplot(x)
Homeruns
Box plot quizHo
meruns
Whatis:• Q1?• Q3?• Themedian?• Mostextremevaluesthatarenotoutliers• Outliers
A
B
D
C
E
F
Two current players: who is best?
MiguelCabrera:HRin2014=25
DavidOr
Comparing players with side-by-side box plots
HowwouldyoudescribethedifferencesbetweenthesetwoplayersintermsofHRs?Whoisbe#er?
A B
Let’s compare two more players1985
WadeBoggs:BA=.368 TedWilliams:BA=.406
1941
Careerbestseasons Whoisbe#er?
Who is best here?
IsTedWilliamsbe#erthanWadeBoggs?
TedWilliamshit.406in194123plentyofpeoplehitover.400beforehimbutnoonehassince…
Have the best players go>en worse at hiOng over the past 140 years?
Maxbaf
ngaverage
Year
Comparing players across +me periods
Problem:baseballhaschangedfrom1871tonow
Wecan’tsimplycomparesta
Histograms of baOng average 1941 vs. 1985
Dothebafngaverageslooksimilarintheseyears?
Density of baOng average 1941 vs. 1985
Dothebafngaverageslooksimilarintheseyears?
z-scores
Thez-scorestellshowmanystandarddevia
z-scores for comparing players across eras
Whencomparingplayersacrosseras,wewillusethemean(x̄),andstandarddevia
Comparing Ted and Wade to their peers
In1941:• Meanbafngaveragewas:.276• Standarddevia
Comparing Ted and Wade to their peers
Wade’sbafngaveragez-score:3.82Ted’sbafngaveragez-score:3.97
Whoisthebe#erhi#er?
Career z-scored baOng averages
What about Home Runs…
Next class: correla+on!
QandR:BigDataBaseballchapter4