37
Baseball Statistics Baseball Statistics Joseph Mark October 6, 2009

Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Embed Size (px)

Citation preview

Page 1: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Baseball StatisticsBaseball Statistics

Joseph Mark

October 6, 2009

Page 2: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

History of Baseball

• Germans – Schlagball

• English – Rounders– 1745 referenced as base ball– Formalized rules in 1884

• Pitched like a softball• 9 players field, unlimited bat

Page 3: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Baseball in America

• Abner Doubleday (1839)– Mills report in 1908

• Alexander Cartwright (1845)– Formalized rules

Page 4: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Stats to know

• At Bats (AB) – Batting appearances, not including bases on balls, hit by pitch, sacrifice hits (bunts), sacrifice flies, & catchers' interference, or obstruction

• Plate Appearance (PA) - number of completed batting appearances no matter the result (at-bats + walks + hit-batsmen + sacrifice hits (bunts) + sacrifice flies + catcher's interference/obstruction

• Hits (H) – times reached base because of a batted fair ball without an error by the defense

• Runs (R) – times reached home plate legally and safely

• Total Bases (TB) – 1 * singles + 2 * doubles + 3 * triples + 4 *home runs

• Sacrifice Fly (SF) – number of fly ball out which allows a runner to score

• Sacrifice Hit (SH) – a deliberate hit allowing a runner to advance

• Strike out (K) – number of times put out by recording three strikes

• Walk (BB) – number of times reached base by receiving four balls

Page 5: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Stats to know

• Innings Pitched (IP) – number of outs recorded pitching / 3Innings Pitched (IP) – number of outs recorded pitching / 3• Earned Run Average (ERA) – earned runs * 9 / innings pitchedEarned Run Average (ERA) – earned runs * 9 / innings pitched• Complete Game (CG) – # of times a pitcher was the only pitcher Complete Game (CG) – # of times a pitcher was the only pitcher

for his teamfor his team• Shutout (SHO) - # of complete games allowing zero runsShutout (SHO) - # of complete games allowing zero runs• Save (Sv)Save (Sv)

• Win (W) - number of games where pitcher was pitching while his Win (W) - number of games where pitcher was pitching while his

team took the lead and went on to winteam took the lead and went on to win • Walks + Hits per Inning Pitched (WHIP)Walks + Hits per Inning Pitched (WHIP)

Page 6: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Basic Baseball Stats

• Batting Average (AVG) – hits / at batsBatting Average (AVG) – hits / at bats

• Home Runs (HRs) Home Runs (HRs)

• Runs Batted In (RBIs)Runs Batted In (RBIs)

• Earned Run Average (ERA) – ER * 9 / IPEarned Run Average (ERA) – ER * 9 / IP

• Strikeout (K)Strikeout (K)

• Wins (W)Wins (W)

Page 7: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Importance of Statistics in Baseball

• Game of individual matchups– Pitcher vs. Hitter

Nolan Ryan vs.

Gates Brown Lonnie Smith

AB H AB H

24 0 24 12

Page 8: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

“Stats don’t lie, but they don’t tell the whole truth”

WW IPIP ERAERA WHIPWHIP KK CGCG SHOSHO WPWP

1717 228.2228.2 3.153.15 1.0671.067 214214 44 33 66

2020 220.1220.1 3.513.51 1.2571.257 213213 00 00 1414League League highhigh

Page 9: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

“Stats don’t lie, but they don’t tell the whole truth”

Player A - .251 batting average

199 strikeouts

Player B - .316 batting average

57 strikeouts

Page 10: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

But what we didn’t tell you

• Player A – 48 home runs

81 walks

• Player B – 9 home runs

23 walks

Page 11: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

So we have more stats

• On-base percentage (OBP) – (H + BB + HBP) / (AB + BB + HBP + SF)

• Slugging Percentage (SLG%)– Total bases / at bats

• On Base Plus Slugging (OPS)– OBP + SLG

Page 12: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized
Page 13: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Picking a stat is hard to do

How do we pick just one?!How do we pick just one?!

Page 14: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Sabermetrics

• Bill James – Society for American Baseball Research– Bill James defined sabermetrics as "the search for objective

knowledge about baseball." Thus, sabermetrics attempts to answer objective questions about baseball, such as "which player on the Red Sox contributed the most to the team's offense?" or "How many home runs will Ken Griffey Jr. hit next year?" It cannot deal with the subjective judgments which are also important to the game, such as "Who is your favorite player?" or "That was a great game."

Page 15: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Usefulness of Sabermetrics

• Shortcomings of batting average/home runs/rbis

• Better predictor of future performance

• Runs Created

• Markov Runs per Game

• Win Shares

Page 16: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Runs Created

• “With regard to an offensive player, the first key question is how many runs have resulted from what he has done with the bat and on the basepaths. Willie McCovey hit .270 in his career, with 353 doubles, 46 triples, 521 home runs and 1,345 walks -- but his job was not to hit doubles, nor to hit singles, nor to hit triples, nor to draw walks or even hit home runs, but rather to put runs on the scoreboard. How many runs resulted from all of these things?”

Page 17: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

• Runs Created

– RC/27: (RC x 3 x LgIP) / (2 x LgG) / (AB – H + SH + SF + CS + GDP)

• Compare each player’s contribution over 1 game

• Win Share– Measure of total performance, cumulative

• Markov RPG– takes into account runs scored, on base (hits + walks + hit by

pitch), total bases, runs batted in, and stolen bases using (stolen bases * stolen bases) / stolen base attempts

SABR statsA = On-base factor B = Advancement factor C = Opportunity factor

Page 18: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Comparison of Runs Created vs. Actual Runs Scored by All 30 Major League

teams in 2008

Page 19: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Wins Shares, please share• Win Shares Explained• First, you divide responsibility for a team's wins between the offense (batting and baserunning) and defense

(pitching and fielding). You do this by calculating the team run differential through a method James calls Marginal Runs. You first calculate the average number of runs scored per team in the league. You next adjust your team's runs scored and runs allowed for the ballpark in which they played half their games (i.e. home games). Then you add together two figures: all runs scored over 52% of the league average (credited to the offense), and all runs allowed less than 152% of the league average (credited to the defense). This is total marginal runs.

• Next, you take the percent of marginal runs contributed by the offense, multiply it by the number of wins times three. This is the total number of offensive Win Shares. You do the same thing for defensive Win Shares.

• Next, you attribute offensive Win Shares to individual players. This is done through two key metrics: Runs Created and Outs Made. Runs Created is a formula built by James and refined over the years. It starts with the basic equation of OBP times total bases and then adds player credit for other factors, including stolen bases, caught stealing, grounding into double plays, batting average and home runs with runners in scoring position and the kitchen sink. Runs Created is calculated for every single batter, including pitchers (if they're in the National League).

• Next, you subtract the league "background" Runs Created (52% of the league average) from each player's Runs Created based on the number of Outs Made by that batter, adjust it for ballpark, and credit each player with the result; essentially individual marginal runs created. Add these up for all players and use each player's percentage of the whole to allocate offensive Win Shares to each. Note that any player whose Runs Created are less than 52% of the league average runs created per out is credited with no Win Shares. This doesn't happen very often (except for pitchers).

• That was the easy part. Now you've got to deal with the defense. The first step is to divide defensive Win Shares between pitching and fielding. This done through a complicated formula that accounts for FIP elements that can be attributed only to pitchers (home runs, walks and strikeouts) as well as a team's DER (Defensive Efficiency Ratio, adjusted for the ballpark) and other fielding statistics such as passed balls, errors and double plays. Typically, about 70% of defensive Win Shares are credited to pitching, and 30% to fielding. The Win Shares system is bound so that pitching never is credited with less than 60%, or more than 75%, of defensive Win Shares.

• Next, you allocate pitching Win Shares to individual pitchers. This is accomplished through an even more complicated formula that starts with each pitcher's marginal runs not allowed (same approach as team marginal runs not allowed), wins, losses and saves. Special consideration is given to relievers by estimating the number of high-leverage innings they pitched (ninth innings with one-run leads are more important than first innings with no score) and something called "Component ERA" which is essentially ERA re-calculated according to the actual underlying run elements.

Page 20: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Continued…• Finally, pitchers are deducted Win Shares if they are absolutely lousy hitters. Call this the "Dean Chance" factor. All these

elements are then mixed together in a complicated formula to allocate pitching Win Shares to individual pitchers. As in offensive Win Shares, any pitcher who gives up more than 152% of league-average Runs Scored (adjusted for ballpark) does not receive any credit for pitching Win Shares.

• One note: responsibility for unearned runs is split 50/50 between pitching and fielding.• Which leads us to the next, most complicated step: allocating fielding Win Shares to fielding positions, and then to

individual fielders. The calculations differ for each position. Essentially, James has selected four defensive statistics to evaluate positions. Here they are by position, listed in order of importance:

– Catchers: Caught Stealing, Errors, Passed Balls and Sacrifice Hits Allowed – First Basemen: Plays Made, Errors, Arm Rating and Errors by third basemen and shortstops – Second Basemen: Double Plays, Assists, Errors and Putouts – Shortstops: Assists, Double Plays, Errors and Putouts – Third Basemen: Assists, Errors, Sacrifice Hits Allowed and Double Plays – Outfielders: Putouts, Team DER, Arm Elements and Assists and Errors

• Lots of things to note about the fielding calculations. – First, the statistics are adjusted based on the number of innings a lefthander pitches for the team, which has an

impact on which side of the field batters hit the ball to. – Second, these stats are calculated as a proportion of the team's total, divided by the league-average proportions of

the total. In other words, if a shortstop has 50 assists and his team has 100 assists in total, he receives just as much credit as the shortstop who has 100 assists and plays on a team with 200 assists in total. This is important, because it adjusts the fielding stats for the fact that fielders may be playing behind pitchers with certain tendencies such as giving up more ground balls vs. fly balls.

– Third, double plays are only factored in as a proportion of potential double plays. If teams don't have a lot of runners on first, they have less of a chance to turn double plays, and Win Shares takes this into account.

– Fourth, team DER is used to credit outfielders with fielding Win Shares because it is James' observation that outfielders have a much larger impact on DER than infielders. James acknowledges that there is some "circular logic" here.

– Fifth, there is a final element included in the formula to allocate fielding Win Shares to individual fielders. This element is called "Range Bonus Play." It particularly impacts outfielders in the following manner: if one outfielder handles more opportunities per inning played than the other outfielders on the team, he will be credited with more fielding Win Shares. This especially impacts centerfielders, who typically handle more chances per inning played than the corner outfielders.

Page 21: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Markov RPG

Page 22: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

NPERA vs. OPS Against

0

0.2

0.4

0.6

0.8

1

1.2

0 1 2 3 4 5 6 7

NPERA

OP

S A

gai

nst

Page 23: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

NPERA vs ERA

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7

NPERA

ER

A

Page 24: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Correlation between various individual hitter statistics

OPS RC RC /27 Win Shares Markov RPG

OPS 1

RC .912 1

RC / 27 .967 .937 1

Win Shares .752 .811 .778 1

Markov RPG .972 .906 .982 .749 1

Page 25: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Mythbusters: The Contract Year

• It is a commonly held belief that players perform better during the final year of their contract in the hopes that a good year will enable them to sign a lucrative new deal

Page 26: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Difference in Means Testing

Contrasts BattingAverage P-Value

HR/PA P-Value

OPS P-Value RunsCreated/27 P-Value

Markov RPG P-Value

All Players 0.502779 0.628555 0.250987 0.144358 0.330044

A - Players 0.842572 0.938286 0.772146 0.034324 0.907109

B - Players 0.333571 0.589378 0.070181 0.000004 0.145831

Page 27: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

• RC/27 returns significant results for A players when tested alone and B players when tested alone. These results show that the mean for A players increases, on average, from 2.991 to 3.344 RC/27, whereas B players tend to decrease, on average, from 1.824 to 1.375. The fact that these two groups of players have a tendency to move in opposite directions in this respect explains why the results are not statistically significant when compared en masse.

• OPS and Markov RPG actually increase AFTER signing a new contract!

Page 28: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Mythbusters 2: Waiting for your Pitch

• Another commonly held perception is that batters that “wait for their pitch” are more likely to get a hit and when they do hit the ball, it will go farther (perhaps resulting in more home runs)

Page 29: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

The regression equation isOPS = 0.435 + 0.0833 P/PA

Predictor Coef SE Coef T PConstant 0.43478 0.07606 5.72 0.000P/PA 0.08328 0.01988 4.19 0.000

S = 0.101122 R-Sq = 4.5% R-Sq(adj) = 4.2%

The regression equation isMarkov RPG = 0.513 + 1.17 P/PA

Predictor Coef SE Coef T PConstant 0.5125 0.9828 0.52 0.602P/PA 1.1659 0.2568 4.54 0.000

S = 1.30664 R-Sq = 5.2% R-Sq(adj) = 5.0%

Regression using Pitches per Plate Appearance to predict OPS

Regression using Pitches per Plate Appearance to predict RPG

Page 30: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Correlations of A&B Groups of Players w/

OPS and RPG for 2008 season CorrelationCorrelation P-valueP-value

A Players PPA vs. A Players A Players PPA vs. A Players OPSOPS

.246.246 .003.003

A Players PPA vs. A Players A Players PPA vs. A Players RPGRPG

.260.260 .001.001

B Players PPA vs. B Players B Players PPA vs. B Players OPSOPS

.204.204 .002.002

B Players PPA vs. B Players B Players PPA vs. B Players RPGRPG

.223.223 .001.001

Page 31: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Can We Predict Walks?

Test (1) vs (2)Test (1) vs (2) Mean PPA Mean PPA PPA1/PPA2PPA1/PPA2

Mean Mean Walks Walks

(1) / (2)(1) / (2)

Est. Diff in Est. Diff in Means Means

Total WalksTotal Walks

TT P-ValueP-Value

Top 1/3 vs Top 1/3 vs Mid 1/3Mid 1/3

4.109 / 4.109 / 3.8143.814

49.6 / 41.049.6 / 41.0 8.558.55 2.872.87 .004.004

Mid 1/3 vs Mid 1/3 vs Bot 1/3Bot 1/3

3.814 / 3.814 / 3.5303.530

41.0 / 27.941.0 / 27.9 13.1013.10 5.625.62 .000.000

Top 1/3 vs Top 1/3 vs Bot 1/3Bot 1/3

4.109 / 4.109 / 3.5303.530

49.6 / 27.949.6 / 27.9 21.6521.65 8.358.35 .000.000

Page 32: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

How About Home Runs?

• Divided players into thirds according to number of pitches seen per at bat.– Those who saw the most pitches in the first third,

those who saw the least number of pitches per at bat in the bottom third, and a middle third.

Players in this top group hit on average .03055 Players in this top group hit on average .03055 home runs per plate appearance slightly higher home runs per plate appearance slightly higher than the .02938 of the middle group, and both are than the .02938 of the middle group, and both are significantly higher than the .02282 of the bottom significantly higher than the .02282 of the bottom group group

Page 33: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

So should you wait for your pitch?So should you wait for your pitch?

∆∆PPAPPA ∆∆WalksWalks ∆∆HRsHRs ∆∆OPSOPS ∆∆RPGRPG

TotalTotal .0326*.0326* -.24-.24 -.675-.675 -.0176*-.0176* -.2241*-.2241*

Increase Increase PPAPPA

.1725*.1725* 3.01*3.01* -1.064*-1.064* -.019*-.019* -.198-.198

Decrease Decrease PPAPPA

-.15*-.15* -.168-.168 -4.48*-4.48* -.01615-.01615 -.258*-.258*

Summary of Changes 2007-2008Summary of Changes 2007-2008

* Indicates significance at 5%* Indicates significance at 5%

Page 34: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Conclusions

• Changing the number of pitches seen per plate appearance does not necessarily increase a player’s raw performance measures. Rather, a player who sees an increase in the number of pitches per plate appearance from year to year will have a better change in performance relative to a player who sees a decrease in number of pitches per plate appearance from one year to the next.

Page 35: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Conclusions

• Players performance is not significantly better during a contract year, in fact, it may actually be worse.

• Increasing the number of pitches you see does not increase performance– However, you will walk more– If you see fewer pitches, you are more likely

to do worse

Page 36: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

Baseball keeps stats for EVERYTHING

• Hitting Stats– Singles, doubles, triples, G/F, GIDP, HBP,

LOB, R, SF, SB, TB

• Pitching Stats– ERA, WHIP, GF, GS, K/9, BB/9, HLD, IBB, IP,

CG, SHO, SV, SVO, WP

• Fielding Stats– PO, A, TC, E

Page 37: Baseball Statistics Joseph Mark October 6, 2009. History of Baseball Germans – Schlagball English – Rounders –1745 referenced as base ball –Formalized

References and Works Cited• Cover, Thomas, and Carroll Keilors, “An Offensive Earned-Run Average for

Baseball,” Operations Research, Vol. 25 No. 5, September-October 1977, pp 729-740

• ESPN MLB Team Stats, ESPN Internet Ventures 2009, • http://sports.espn.go.com/mlb/stats/aggregate?statType=batting&seasonT

ype=2&group=9&type=reg&sort=&split=0&season=2008• Free Agent Tracker, ESPN Internet Ventures 2009,

http://sports.espn.go.com/mlb/features/freeagents?type=ranked&season=2008

• James, Bill, The Bill James Handbook, ACTA Sports, Skokie, Illinois, 2009• Krautman, Anthony C., and Margaret Oppenheimer, “Contract Length and the

Return to Performance in Major League Baseball,” Journal of Sports Economics, Vol. 3, No. 1, 2002, pp 6-17.

• Lewis, Michael, Moneyball, Norton, W. W. & Company, Inc., New York, New York, 2004.

• Sagarin, Jeff, Jeff Sagarin MLB Ratings, October 7, 2008, www.usatoday.com/sports/sagarin/majors08.htm

• Studeman, Dave, Major League Baseball Graphs, May 16, 2004,http://www.baseballgraphs.com/main/index.php/site/details/#sharecalc

• The Hardball Times, THT Win Shares, October 1, 2008, http://www.hardballtimes.com/thtstats/main/?view=winshares