View
4
Download
0
Category
Preview:
Citation preview
Predicting an MVPBrian King, Derek Zhang, Juleen Graham, Erin Henning, Ryan Haney
How is an MVP selected?◼ From 1979-1995, NBA players voted for the MVP
◼ 1995-2010, votes strictly from a panel of sportswriters and broadcasters - Votes from US and CA, each of whom casted a vote for 1st through 5th place selections
◼ 2010- One ballot is cast by fan votes from online
https://en.wikipedia.org/wiki/NBA_Most_Valuable_Player_Award)
Trends?◼ What caused a change in trend from
Centers/Forwards to Guards/Forwards?
Questions◼ What are the most important statistical criteria for
choosing an MVP?
◼ Can we create a model to predict the probability of an individual winning the MVP award?
Procedures◼ Data from the 1991-1992 season to 2015-2016
- Top 150 players for each season that had the most playing time
◼ Logistic Regression Model
◼ Used the data from 1991-1992 to 2012-2013 seasons to fit the model
◼ Predicted on 2013-2014 to 2015-2016
◼ Compare “order” of prediction to true voting order
The Logistic Regression Model
Where Xi = Predictor variable
Assumptions● Binary Response variable (MVP or not)● Continuous, Independent Explanatory variables
The Variables ◼ Points Per Game
◼ Blocks, Steals, Assists, Rebounds
◼ Effective Field Goal Percentage
◼ Position
◼ Personal Fouls, Age, Minutes Played, Turnovers, ...
2013-14 Season: All StatsActual
MVP: Kevin Durant2nd: LeBron James3rd: Blake Griffin4th: Joakim Noah5th: James Harden
Kevin Love: 11thStephen Curry: 6thLaMarcus Aldridge: 10th
Prediction
MVP: Kevin Love2nd: LeBron James3rd: Kevin Durant4th: Stephen Curry5th:LaMarcus Aldridge
Blake Griffin: 12thJoakim Noah: 31stJames Harden: 8th
2013-14 Season: MVStatsPrediction
MVP: Kevin Durant2nd: LeBron James3rd: Kevin Love4th: Stephen Curry5th: Chris Paul
Blake Griffin: 7thJoakim Noah: 23rdJames Harden: 9th
Actual
MVP: Kevin Durant2nd: LeBron James3rd: Blake Griffin4th: Joakim Noah5th: James Harden
Kevin Love: 11thStephen Curry: 6thChris Paul: 7th
2014-15 SeasonPrediction
MVP: Russell Westbrook2nd: LeBron James3rd: Chris Paul4th: James Harden5th: Stephen Curry
Anthony Davis: 10th
Actual
MVP: Stephen Curry2nd: James Harden3rd: LeBron James4th: Russell Westbrook5th: Anthony Davis
Chris Paul: 6th
2015-16 SeasonPrediction
MVP: Stephen Curry2nd: Russell Westbrook3rd: LeBron James4th: Kevin Durant5th: James Harden
Kawhi Leonard: 26th
Actual
MVP: Stephen Curry2nd: Kawhi Leonard3rd: LeBron James4th: Russell Westbrook5th: Kevin Durant
James Harden: 9th
Random Forests
◼ Decision Tree Learning◼ Bootstrap Aggregating ◼ Random Subspace Method
Decision Tree Learning
Pts<x
Pts>x000000000
Assists Per Game
Assists<x
000100001
Assists>x
Rebounds Per Game
Points Per Game
Assists Per Game
Rebounds Per Game 2
Algorithm chooses variable at each step that best splits the data into successes and failures
Bootstrap Aggregating
◼ random forest consists of b= 1, …, B randomized tree models
◼ each model (tree) is built with a bootstrap sample of the original data (sample of the original data of same size with replacement)
◼ training many trees on the same data set leads to problems (possibly recreating the same tree)
◼ averaging the predictions from all the individual regression trees leads to better performance
Random Forest Interpretation
◼ samples not included in any given bootstrap sample are called “out-of-bag” samples
◼ %IncMSE “=” how much worse the predictions are when a permuted version of the variable is used instead of the true values◼ Build tree, make predictions using “real” data
values, record the error (MSE) of this◼ Permute values of variable in the out-of-bag
sample, re-do predictions, recompute MSE ◼ %IncMSE is how much the error increases
for the permuted samples vs the true samples
Most Important MVP VariablesAccording to the Random Forest method:
% Increase MSE
PPG 0.0020350589APG 0.0010963324MPG 0.0010791207SPG 0.0010197867PFPG 0.0007518895TPG 0.0007515411eFG. 0.0007482838BPG 0.0006166867Age 0.0002998612RPG 0.0001863932POS 0.0001672975
According to Logistic Regression:
Z-score (absolute value)
PPG 5.167RPG 3.395PFPG 3.08 APG 2.58Age 2.291eFG. 2.104BPG 1.566POS 1.459MPG 0.62TPG 0.314SPG 0.128
Drawbacks to our models
◼ Only one MVP can be crowned every year◼ Predictions using our models assume that the
response variable (MVP or not) is independent between players
◼ As a result, all probabilities do not sum to 1◼ Our models can rank players in likelihood of winning
MVP, but cannot give explicit probabilities
Conclusions
◼ The most important variables are:◼ Points Per Game◼ Assists Per Game◼ Rebounds Per Game
◼ The least important variables include:◼ Blocks Per Game◼ Steals Per Game
◼ The problem with defensive production◼ MVP Voting: Stat-Driven, but not completely
◼ Steve Nash, 2005
Future Work
◼ Further research into possible interaction between variables
◼ Better interpretability of logistic regression predictions◼ Impact of team on MVP prospects◼ Change in MVP selection criteria over the years◼ Changes in rules over the years◼ Growing data set and possible outcomes
Thanks!
Questions?
Recommended