Upload
konstantinos-pelechrinis
View
1.960
Download
2
Embed Size (px)
Citation preview
SPORTS ANALYTICS IN THE ERA OF BIG DATA AND DATA SCIENCE KONSTANTINOS PELECHRINIS
@kpelechrinis https://412sportsanalytics.wordpress.com
DATA-DRIVEN COACHES?
DATA-DRIVEN FRONT OFFICES?
WHY NOW?
➤ Data analysis & use of statistics is not new in sports!!
➤ Now we have the technology to collect many more detailed information about the game
➤ Detailed box score
➤ Play-by-play data
➤ Player tracking
TRACKING
RESOURCES
Some of the examples
are taken from this book
SPORT MARKETS
➤ A typical business or firm operates with the objective of profit maximization
➤ This might not be the case for the owner of a professional sports team!!
➤ For profit year by year
➤ Maximize wins
➤ Capital appreciation
SPORT MARKETS
➤ Becoming the dominant player is not the goal in sports industry
➤ If a team were assured of victory in almost any competition the whole league would be of little - if at all - interest
➤ Competitive balance
➤ Salary cap!
➤ Draft!
SPORT MARKETS
●
● ●
●
● ●●
●●●●
●●
●●
● ●
●●●●
●● ●
●● ●
●●
●
LAA
BAL WSN
LAD
STL DET
SFGPIT & OAK
CLENYY
TORMIL
ATL
MIA
CHC PHI
BOSMIN
TEXCOL
ARI
KCR
SEA
NYM
SDP & TBRCIN
CHW
HOU
40
45
50
55
60
50 100 150 200 250Team Payroll (Millions of Dollars)
Perc
enta
ge o
f Gam
es W
on
Correlation coef=0.26p-value = 0.16!
Only 6% of the win/loss percentage is
explained by the payroll differences!
RANKING TEAMS
➤ Team performance is central to sports data science
➤ Ratings and rankings
➤ Challenges
➤ Imbalance in team schedules
➤ Win/Loss percentages does not consider strength schedule
RANKING TEAMS
➤ Network-based solution
➤ Win/loss directed network
➤ PageRank
RANKING TEAMS
RANKING TEAMS
RANKING TEAMS
➤ Unidimensional scaling
➤ Matrix of how many times each team beats the other
➤ Transform to proportions, average across rows or columns and standardized it
➤ Automatic adjustment for schedule strength
RANKING TEAMS
NYK PHIMINLAL ORL
SACCHA DENDETIND MIABOS MILBKN UTAPHXNOP WASOKC TORCHI PORDALCLE MEMLAC SASHOUATL
GSW
0
200
400
600
ATL BKN BOS CHA CHI CLE DAL DEN DET GSWHOU IND LAC LAL MEM MIA MIL MIN NOP NYK OKC ORL PHI PHX POR SAC SAS TOR UTA WAS
Ran
king
Sco
re
COACHING DECISIONS
➤ Evidence-based coaching
➤ Go for the 4th down or not?
➤ Go for the 2-point conversion or take the cheap shot?
➤ Shoot for three to win or shoot for two to tie the game?
➤ …
➤ We can now quantify the rationality of coaches!
COACHING DECISIONS
COACHING DECISIONS
OR
COACHING DECISIONS
E[p]= 2* - 1*
15
14
14
15
9
24
12
13
13
1624
21
10
17
21
11
12
14
11
12
10
14
9
16
22
6
14
5
14
22
1218
-0.50
-0.25
0.00
0.25
0.50
ARI ATL BAL BUF CAR CHI CIN CLE DAL DEN DET GB HOU IND JAC KC MIA MIN NE NO NYG NYJ OAK PHI PIT SD SEA SF STL TB TENWAS
Exp
ecte
d P
oint
Gai
n
COACHING DECISIONS
COACHING DECISIONS
Touchback
-2
-1
0
1
2
3
0 25 50 75 100Distance to the goal line when 4th down
Exp
ecte
d po
ints
gai
ned
COMPUTATIONAL GAME MODELS
COMPUTATIONAL GAME MODELS
-1.0
-0.5
0.0
0.5
1.0
Q1 Q2 Q3 Q4Quarter
Rat
io r
QuarterQ1
Q2
Q3
Q4
0.00
0.01
0.02
0.03
0.04
0 20 40 60Time (minute)
Turn
over
Den
sity
COMPUTATIONAL GAME MODELS
Bootstrap
BB
Historical game data
Correlationmatrix
LogisticRegression
Model
x1111,· · ·· · ·,xB1B1
x1212,. . .. . .,xB2B2
P1P1
P2P2
H0 : P1 = P2H0 : P1 = P2
H1 : P1 6= P2H1 : P1 6= P2
P1 � P2P1 � P2
pp-value
Mean accuracy=0.627 Mean accuracy=0.787
Mean accuracy=0.517 Mean accuracy=0.6
0.00
0.25
0.50
0.75
1.00
8 9 10 11 12 13 14 15 16 17Week
Accuracy
Legend text
2014
2015
LEAGUE CHANGES
➤ Can we predict and/or evaluate the impact of a rule change?
➤ What if we move the three point line further away?
➤ What was the impact of the new PAT rule?
➤ Will the new touchback rule give an advantage to the offense?
LEAGUE CHANGES
Should the 3-point line be moved further away?
LEAGUE CHANGES
LEAGUE CHANGES
SPORTS MARKETING
➤ Sports are part of the entertainment market
➤ Marketing decisions can always benefit from good data!
➤ What price should the ticket have?
➤ What team-branded merchandise should you sell?
➤ Does a swag promotion justify a higher ticket price?
➤ What is the best strategy for national branding?
➤ …
SPORTS MARKETING
➤ Case study: Consumer preferences for Dodger’s stadium seating
➤ Conjoint analysis
➤ Product profiles
➤ Consumers rank the products
➤ Ranking reveals their preference
SPORTS MARKETING
Part worths (i.e., regression coefficients),
reflect the strength of consumer preferences
for each level of each product attribute.
SPORTS MARKETING
➤ Can we use these results to assess willingness for a consumer to pay for tickets?
➤ $20 tickets have part-worth of 3.25, while $95 tickets have part-worth of -3.50.
➤ Difference in part-worth is 6.25, which in terms of $ this corresponds to $75
➤ 1 part-worth is worth $11.11 to the consumer
➤ For this consumer we see that the part-worth differential between a loge seat and a field seat is 2.75
➤ This consumer is willing to spend 2.75*11.11=$30.55 for a field seat compared to a loge seat
PROMOTING BRANDS & PRODUCTS
PROMOTING BRANDS & PRODUCTS
= a* + b* + c* + d
PROMOTING BRANDS & PRODUCTS
DATA SOURCES
➤ There are various websites where you can get data
➤ Mainly aggregate statistics, boxscores etc
DATA SOURCES
➤ Flexibility —> play-by-play data
➤ Major leagues provide an API
➤ Sport enthusiast have created libraries to access them
Case study: NFLgame in Python
https://github.com/BurntSushi/nflgame
DATA SOURCESgames = nflgame.games(2015,week=1,kind=‘REG’)
>>> games [<nflgame.game.Game object at 0x107652210>, <nflgame.game.Game object at 0x107652310>, <nflgame.game.Game object at 0x107652410>, <nflgame.game.Game object at 0x107652510>, <nflgame.game.Game object at 0x107652610>, <nflgame.game.Game object at 0x107652710>, <nflgame.game.Game object at 0x107652810>, <nflgame.game.Game object at 0x107652910>, <nflgame.game.Game object at 0x107652a10>, <nflgame.game.Game object at 0x107652b10>, <nflgame.game.Game object at 0x107652c10>, <nflgame.game.Game object at 0x107652d10>, <nflgame.game.Game object at 0x107652e10>, <nflgame.game.Game object at 0x107652f10>, <nflgame.game.Game object at 0x107d02050>, <nflgame.game.Game object at 0x107d02150>]
>>> games[0].home u'NE' >>> games[0].away u'PIT' >>>
>>> games[0].score_home 28 >>> games[0].score_away 21
DATA SOURCES
>>> for i in games[0].drives: ... print i ... PIT (Start: Q1 15:00, End: Q1 09:40) Missed FG NE (Start: Q1 09:40, End: Q1 07:41) Punt PIT (Start: Q1 07:41, End: Q1 03:14) Punt NE (Start: Q1 03:14, End: Q2 11:11) Touchdown PIT (Start: Q2 11:11, End: Q2 08:38) Missed FG NE (Start: Q2 08:38, End: Q2 04:01) Touchdown PIT (Start: Q2 04:01, End: Q2 00:03) Field Goal NE (Start: Q2 00:03, End: Q2 00:00) End of Half NE (Start: Q3 15:00, End: Q3 10:37) Touchdown PIT (Start: Q3 10:37, End: Q3 06:43) Touchdown NE (Start: Q3 06:43, End: Q3 04:15) Punt PIT (Start: Q3 04:15, End: Q4 11:39) Field Goal NE (Start: Q4 11:39, End: Q4 09:20) Touchdown PIT (Start: Q4 09:20, End: Q4 08:29) Punt NE (Start: Q4 08:29, End: Q4 07:29) Punt PIT (Start: Q4 07:29, End: Q4 07:00) Interception NE (Start: Q4 07:00, End: Q4 02:59) Punt PIT (Start: Q4 02:59, End: Q4 00:02) Touchdown NE (Start: Q4 00:02, End: Q4 00:00) End of Game
DATA SOURCESplays = nflgame.combine_plays(games) >>> for p in plays: ... print p ... (NE, NE 35, Q1) S.Gostkowski kicks 65 yards from NE 35 to end zone, Touchback. (PIT, PIT 20, Q1, 1 and 10) (15:00) De.Williams right tackle to PIT 38 for 18 yards (D.Hightower). (PIT, PIT 38, Q1, 1 and 10) (14:21) B.Roethlisberger pass short right to A.Brown pushed ob at PIT 47 for 9 yards (D.Hightower). (PIT, PIT 47, Q1, 2 and 1) (14:04) De.Williams right guard to NE 49 for 4 yards (J.Collins; M.Brown). (PIT, NE 49, Q1, 1 and 10) (13:26) B.Roethlisberger pass short right to H.Miller to NE 35 for 14 yards (J.Mayo). (PIT, NE 35, Q1, 1 and 10) (12:42) (Shotgun) De.Williams right guard to NE 24 for 11 yards (J.Collins). (PIT, NE 24, Q1, 1 and 10) (12:05) A.Brown sacked at NE 32 for -8 yards (M.Brown). (PIT, NE 32, Q1, 2 and 18) (11:20) (Shotgun) De.Williams right end pushed ob at NE 28 for 4 yards (D.Hightower). PENALTY on PIT-M.Gilbert, Offensive Holding, 10 yards, enforced at NE 32 - No Play. (PIT, NE 42, Q1, 2 and 28) (10:53) W.Johnson right guard to NE 36 for 6 yards (R.Ninkovich). NE-D.Easley was injured during the play. He is Out. (PIT, NE 36, Q1, 3 and 22) (10:28) (Shotgun) B.Roethlisberger pass short right to H.Miller to NE 26 for 10 yards (P.Chung; M.Butler). (PIT, NE 26, Q1, 4 and 12) (9:44) J.Scobee 44 yard field goal is No Good, Wide Right, Center-G.Warren, Holder-J.Berry. (NE, NE 34, Q1, 1 and 10) (9:40) (Shotgun) T.Brady pass short left to J.Edelman pushed ob at NE 47 for 13 yards (W.Gay). PENALTY on NE-N.Solder, Unnecessary Roughness, 15 yards, enforced between downs. (NE, NE 32, Q1, 1 and 10) (9:14) (Shotgun) T.Brady pass short left to D.Lewis to NE 44 for 12 yards (J.Harrison). (NE, NE 44, Q1, 1 and 10) (9:00) (No Huddle, Shotgun) T.Brady pass short left to D.Lewis ran ob at PIT 43 for 13 yards. (NE, PIT 43, Q1, 1 and 10) (8:31) (No Huddle, Shotgun) T.Brady pass incomplete short right to R.Gronkowski. (NE, PIT 43, Q1, 2 and 10) (8:27) T.Brady pass incomplete deep right to D.Amendola. (NE, PIT 43, Q1, 3 and 10) (8:22) (Shotgun) T.Brady sacked at PIT 43 for 0 yards (B.Dupree). (NE, PIT 43, Q1, 4 and 10) (7:48) R.Allen punts 36 yards to PIT 7, Center-J.Cardona, fair catch by A.Brown. (PIT, PIT 7, Q1, 1 and 10) (7:41) De.Williams left guard to PIT 13 for 6 yards (A.Branch; G.Grissom). (PIT, PIT 13, Q1, 2 and 4) (7:07) De.Williams left tackle to PIT 12 for -1 yards (C.Jones). (PIT, PIT 12, Q1, 3 and 5) (6:26) (Shotgun) B.Roethlisberger pass short left to A.Brown pushed ob at PIT 22 for 10 yards (D.McCourty). (PIT, PIT 22, Q1, 1 and 10) (5:54) De.Williams right guard to PIT 26 for 4 yards (R.Ninkovich). PENALTY on PIT-K.Beachum, Illegal Formation, 5 yards, enforced at PIT 22 - No Play. (PIT, PIT 17, Q1, 1 and 15) (5:29) (Shotgun) B.Roethlisberger pass short right to A.Brown to PIT 20 for 3 yards (J.Collins). (PIT, PIT 20, Q1, 2 and 12) (4:48) B.Roethlisberger sacked at PIT 14 for -6 yards (D.Hightower). (PIT, PIT 14, Q1, 3 and 18) (4:03) (Shotgun) B.Roethlisberger pass deep left to H.Miller to PIT 31 for 17 yards (D.McCourty; T.Brown). (PIT, PIT 31, Q1, 4 and 1) (3:25) J.Berry punts 50 yards to NE 19, Center-G.Warren. D.Amendola to NE 34 for 15 yards (V.Williams). PENALTY on NE-M.Slater, Illegal Block Above the Waist, 10 yards, enforced at NE 20. (NE, NE 10, Q1, 1 and 10) (3:14) D.Lewis left tackle to NE 18 for 8 yards (W.Allen). (NE, NE 18, Q1, 2 and 2) (2:40) D.Lewis up the middle to NE 19 for 1 yard (M.Mitchell). (NE, NE 19, Q1, 3 and 1) (2:05) T.Brady up the middle to NE 20 for 1 yard (L.Timmons; S.McLendon). (NE, NE 20, Q1, 1 and 10) (1:14) D.Lewis left end pushed ob at NE 25 for 5 yards (L.Timmons). PENALTY on NE-N.Solder, Offensive Holding, 10 yards, enforced at NE 20 - No Play. (NE, NE 10, Q1, 1 and 20) (:45) (Shotgun) T.Brady pass short left to A.Dobson to NE 19 for 9 yards (W.Gay). (NE, NE 19, Q1, 2 and 11) (:12) (Shotgun) T.Brady pass short left to J.Edelman to NE 28 for 9 yards (C.Allen). ….
What does all this mean for me?
Work = Fun
BUT…
➤ Good understanding of fundamentals of statistics and probabilities
➤ Ability to work with APIs and data
➤ Python, R, MySQL
➤ Of course domain knowledge