Upload
osborne-obrien
View
216
Download
0
Embed Size (px)
DESCRIPTION
Studies Bill James' classic aging study in the "1982 Baseball Abstract" Work by Tom Tango Academic studies: Jim Albert, Ray C. Fair, and others (This presentation is based mostly on Tango, with a bit of James)
Citation preview
Studying the Effects of Aging in Major League Baseball
Phil Birnbaumwww.philbirnbaum.com
Aging patterns in baseball How do players age? Is it different for hitters and pitchers? If you have a good player who's 31,
how much do you expect him to decline over the next few years?
Want a result like: "hitters decline X% between age 31 and 35"
Studies Bill James' classic aging study in
the "1982 Baseball Abstract" Work by Tom Tango Academic studies: Jim Albert, Ray
C. Fair, and others (This presentation is based mostly
on Tango, with a bit of James)
Previous findings The best batters peak at 27 –
that's when most of the major awards are won (James)
Different skills peak at different times: speed early, HRs mid-career, BBs late (Tango)
A naive look What's the average performance of
the various age cohorts? Fairly similar, it turns out, except
at the extremes
Average Batting vs. Age
0
1
2
3
4
5
6
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
Run
s pe
r gam
e (R
C27
)
Average Pitching vs. Age
0
1
2
3
4
5
6
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
Com
pone
nt E
RA
A naive look Statistical illusion Curve traces different groups of
players Players at 25 are a cross-section of
the league Players at 40 are former superstars The players at 40 were much better
players when they were 25
Example Age 27
Player A: 6.00 … Player B: 5.00 … Player C: 4.00 Average: 5.00
Age 35 Player A: 5.50 … Player B: 4.50 … Player C: released Average: 5.00
Age 40 Player A: 5.00 … Player B: retired … Player C: released Average: 5.00
All players decline with age, but the mean is still 5.00
Paired seasons "Paired seasons" method
Find all players who were 28 in season X See how they did in season X+1
(Weight the average by playing time) The average difference reflects the
effects of aging from 28 to 29 Career path obtained by chaining
(multiplying) single-year effects
Paired seasons: Batting
0
0.2
0.4
0.6
0.8
1
1.2
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
RC
27 re
lativ
e to
pea
k
Paired seasons: Pitching
00.5
11.5
22.5
33.5
44.5
5
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
ERA
rela
tive
to p
eak
Paired seasons: results biased The paired-seasons method shows big
declines as players age But it suffers from a bias – selective
sampling Players who were "lucky" in season X
(large positive error term) get more playing time in season X+1
Those "lucky" players will show bigger declines
So big declines are over-represented
Example Three 37-year-olds, all of whom have skill
of .250 this year, .240 next year This year, due to chance, they
hit .200, .250, .300 respectively The .200 guy is forced to retire The .250 guy plays half time next year and loses 10
points (.250 .240) The .300 guy plays full time next year and loses 60
points (.300 .240) The weighted average loss is 43 points, not 10
points The decline is very much overestimated
How can we eliminate this bias? Can try to estimate the "true" talent of the three
players Regressing to the mean
The .200 guy is "probably" .220 The .250 guy is "probably" .250 The .300 guy is "probably" .280
Now the third guy declines only 40 points, not 60 Average decline: 30 points More accurate than previous estimate of 43 points If we regressed "perfectly" – all players to their
talent of .250 – we'd get the right answer (10 pts)
Regressing season X How much to regress? Need to do some research to figure
that out Can probably get a theoretical lower
bound from binomial (multinomial) distribution
For now, consider 10% and 30%
Batting, regressed 10%
0
0.2
0.4
0.6
0.8
1
1.2
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
perc
enta
ge o
f pea
k
Batting, regressed 30%
0
0.2
0.4
0.6
0.8
1
1.2
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
perc
enta
ge o
f pea
k
Pitching, regressed 10%
00.5
11.5
22.5
33.5
44.5
5
17 19 21 23 25 27 29 31 33 35 37 39 41 43 45
perc
enta
ge o
f pea
k
Pitching, regressed 30%
00.5
11.5
22.5
33.5
44.5
5
17 19 21 23 25 27 29 31 33 35 37 39 41 43 45
perc
enta
ge o
f pea
k
Conclusions Results sensitive to how much we regress Getting correct estimates of aging using the
paired-seasons method depends on solving the selective sampling problem and/or figuring out how much to regress
Alternative: can fit curves to careers (Albert, Fair)
But this method requires a long career, which means only the most successful players are analyzed
Some selective sampling issues there too
References "Looking For the Prime," 1982 Bill James Baseball
Abstract, p. 191 Tom Tango, http://tangotiger.net/agepatterns.txt Tom Tango, "Forecasting Pitchers – Adjacent Seasons,"
http://www.tangotiger.net/adjacentPitching.html Ray C. Fair, "Estimated Age Effects in Baseball,"
http://www.bepress.com/jqas/vol4/iss1/1/