46
SPORTS ANALYTICS IN THE ERA OF BIG DATA AND DATA SCIENCE KONSTANTINOS PELECHRINIS @kpelechrinis https://412sportsanalytics.wordpress.com

Sports Analytics in the Era of Big Data and Data Science

Embed Size (px)

Citation preview

Page 1: Sports Analytics in the Era of Big Data and Data Science

SPORTS ANALYTICS IN THE ERA OF BIG DATA AND DATA SCIENCE KONSTANTINOS PELECHRINIS

@kpelechrinis https://412sportsanalytics.wordpress.com

Page 2: Sports Analytics in the Era of Big Data and Data Science
Page 3: Sports Analytics in the Era of Big Data and Data Science

DATA-DRIVEN COACHES?

Page 4: Sports Analytics in the Era of Big Data and Data Science

DATA-DRIVEN FRONT OFFICES?

Page 5: Sports Analytics in the Era of Big Data and Data Science

WHY NOW?

➤ Data analysis & use of statistics is not new in sports!!

➤ Now we have the technology to collect many more detailed information about the game

➤ Detailed box score

➤ Play-by-play data

➤ Player tracking

Page 6: Sports Analytics in the Era of Big Data and Data Science

TRACKING

Page 7: Sports Analytics in the Era of Big Data and Data Science

RESOURCES

Some of the examples

are taken from this book

Page 8: Sports Analytics in the Era of Big Data and Data Science

SPORT MARKETS

➤ A typical business or firm operates with the objective of profit maximization

➤ This might not be the case for the owner of a professional sports team!!

➤ For profit year by year

➤ Maximize wins

➤ Capital appreciation

Page 9: Sports Analytics in the Era of Big Data and Data Science

SPORT MARKETS

➤ Becoming the dominant player is not the goal in sports industry

➤ If a team were assured of victory in almost any competition the whole league would be of little - if at all - interest

➤ Competitive balance

➤ Salary cap!

➤ Draft!

Page 10: Sports Analytics in the Era of Big Data and Data Science

SPORT MARKETS

● ●

● ●●

●●●●

●●

●●

● ●

●●●●

●● ●

●● ●

●●

LAA

BAL WSN

LAD

STL DET

SFGPIT & OAK

CLENYY

TORMIL

ATL

MIA

CHC PHI

BOSMIN

TEXCOL

ARI

KCR

SEA

NYM

SDP & TBRCIN

CHW

HOU

40

45

50

55

60

50 100 150 200 250Team Payroll (Millions of Dollars)

Perc

enta

ge o

f Gam

es W

on

Correlation coef=0.26p-value = 0.16!

Only 6% of the win/loss percentage is

explained by the payroll differences!

Page 11: Sports Analytics in the Era of Big Data and Data Science

RANKING TEAMS

➤ Team performance is central to sports data science

➤ Ratings and rankings

➤ Challenges

➤ Imbalance in team schedules

➤ Win/Loss percentages does not consider strength schedule

Page 12: Sports Analytics in the Era of Big Data and Data Science

RANKING TEAMS

➤ Network-based solution

➤ Win/loss directed network

➤ PageRank

Page 13: Sports Analytics in the Era of Big Data and Data Science

RANKING TEAMS

Page 14: Sports Analytics in the Era of Big Data and Data Science

RANKING TEAMS

Page 15: Sports Analytics in the Era of Big Data and Data Science

RANKING TEAMS

➤ Unidimensional scaling

➤ Matrix of how many times each team beats the other

➤ Transform to proportions, average across rows or columns and standardized it

➤ Automatic adjustment for schedule strength

Page 16: Sports Analytics in the Era of Big Data and Data Science

RANKING TEAMS

NYK PHIMINLAL ORL

SACCHA DENDETIND MIABOS MILBKN UTAPHXNOP WASOKC TORCHI PORDALCLE MEMLAC SASHOUATL

GSW

0

200

400

600

ATL BKN BOS CHA CHI CLE DAL DEN DET GSWHOU IND LAC LAL MEM MIA MIL MIN NOP NYK OKC ORL PHI PHX POR SAC SAS TOR UTA WAS

Ran

king

Sco

re

Page 17: Sports Analytics in the Era of Big Data and Data Science

COACHING DECISIONS

➤ Evidence-based coaching

➤ Go for the 4th down or not?

➤ Go for the 2-point conversion or take the cheap shot?

➤ Shoot for three to win or shoot for two to tie the game?

➤ …

➤ We can now quantify the rationality of coaches!

Page 18: Sports Analytics in the Era of Big Data and Data Science

COACHING DECISIONS

Page 19: Sports Analytics in the Era of Big Data and Data Science

COACHING DECISIONS

OR

Page 20: Sports Analytics in the Era of Big Data and Data Science

COACHING DECISIONS

E[p]= 2* - 1*

15

14

14

15

9

24

12

13

13

1624

21

10

17

21

11

12

14

11

12

10

14

9

16

22

6

14

5

14

22

1218

-0.50

-0.25

0.00

0.25

0.50

ARI ATL BAL BUF CAR CHI CIN CLE DAL DEN DET GB HOU IND JAC KC MIA MIN NE NO NYG NYJ OAK PHI PIT SD SEA SF STL TB TENWAS

Exp

ecte

d P

oint

Gai

n

Page 21: Sports Analytics in the Era of Big Data and Data Science

COACHING DECISIONS

Page 22: Sports Analytics in the Era of Big Data and Data Science

COACHING DECISIONS

Touchback

-2

-1

0

1

2

3

0 25 50 75 100Distance to the goal line when 4th down

Exp

ecte

d po

ints

gai

ned

Page 23: Sports Analytics in the Era of Big Data and Data Science

COMPUTATIONAL GAME MODELS

Page 24: Sports Analytics in the Era of Big Data and Data Science

COMPUTATIONAL GAME MODELS

-1.0

-0.5

0.0

0.5

1.0

Q1 Q2 Q3 Q4Quarter

Rat

io r

QuarterQ1

Q2

Q3

Q4

0.00

0.01

0.02

0.03

0.04

0 20 40 60Time (minute)

Turn

over

Den

sity

Page 25: Sports Analytics in the Era of Big Data and Data Science

COMPUTATIONAL GAME MODELS

Bootstrap

BB

Historical game data

Correlationmatrix

LogisticRegression

Model

x1111,· · ·· · ·,xB1B1

x1212,. . .. . .,xB2B2

P1P1

P2P2

H0 : P1 = P2H0 : P1 = P2

H1 : P1 6= P2H1 : P1 6= P2

P1 � P2P1 � P2

pp-value

Mean accuracy=0.627 Mean accuracy=0.787

Mean accuracy=0.517 Mean accuracy=0.6

0.00

0.25

0.50

0.75

1.00

8 9 10 11 12 13 14 15 16 17Week

Accuracy

Legend text

2014

2015

Page 26: Sports Analytics in the Era of Big Data and Data Science

LEAGUE CHANGES

➤ Can we predict and/or evaluate the impact of a rule change?

➤ What if we move the three point line further away?

➤ What was the impact of the new PAT rule?

➤ Will the new touchback rule give an advantage to the offense?

Page 27: Sports Analytics in the Era of Big Data and Data Science

LEAGUE CHANGES

Should the 3-point line be moved further away?

Page 28: Sports Analytics in the Era of Big Data and Data Science

LEAGUE CHANGES

Page 29: Sports Analytics in the Era of Big Data and Data Science

LEAGUE CHANGES

Page 30: Sports Analytics in the Era of Big Data and Data Science

SPORTS MARKETING

➤ Sports are part of the entertainment market

➤ Marketing decisions can always benefit from good data!

➤ What price should the ticket have?

➤ What team-branded merchandise should you sell?

➤ Does a swag promotion justify a higher ticket price?

➤ What is the best strategy for national branding?

➤ …

Page 31: Sports Analytics in the Era of Big Data and Data Science

SPORTS MARKETING

➤ Case study: Consumer preferences for Dodger’s stadium seating

➤ Conjoint analysis

➤ Product profiles

➤ Consumers rank the products

➤ Ranking reveals their preference

Page 32: Sports Analytics in the Era of Big Data and Data Science

SPORTS MARKETING

Part worths (i.e., regression coefficients),

reflect the strength of consumer preferences

for each level of each product attribute.

Page 33: Sports Analytics in the Era of Big Data and Data Science

SPORTS MARKETING

➤ Can we use these results to assess willingness for a consumer to pay for tickets?

➤ $20 tickets have part-worth of 3.25, while $95 tickets have part-worth of -3.50.

➤ Difference in part-worth is 6.25, which in terms of $ this corresponds to $75

➤ 1 part-worth is worth $11.11 to the consumer

➤ For this consumer we see that the part-worth differential between a loge seat and a field seat is 2.75

➤ This consumer is willing to spend 2.75*11.11=$30.55 for a field seat compared to a loge seat

Page 34: Sports Analytics in the Era of Big Data and Data Science

PROMOTING BRANDS & PRODUCTS

Page 35: Sports Analytics in the Era of Big Data and Data Science

PROMOTING BRANDS & PRODUCTS

= a* + b* + c* + d

Page 36: Sports Analytics in the Era of Big Data and Data Science

PROMOTING BRANDS & PRODUCTS

Page 37: Sports Analytics in the Era of Big Data and Data Science

DATA SOURCES

➤ There are various websites where you can get data

➤ Mainly aggregate statistics, boxscores etc

Page 38: Sports Analytics in the Era of Big Data and Data Science

DATA SOURCES

➤ Flexibility —> play-by-play data

➤ Major leagues provide an API

➤ Sport enthusiast have created libraries to access them

Case study: NFLgame in Python

https://github.com/BurntSushi/nflgame

Page 39: Sports Analytics in the Era of Big Data and Data Science

DATA SOURCESgames = nflgame.games(2015,week=1,kind=‘REG’)

>>> games [<nflgame.game.Game object at 0x107652210>, <nflgame.game.Game object at 0x107652310>, <nflgame.game.Game object at 0x107652410>, <nflgame.game.Game object at 0x107652510>, <nflgame.game.Game object at 0x107652610>, <nflgame.game.Game object at 0x107652710>, <nflgame.game.Game object at 0x107652810>, <nflgame.game.Game object at 0x107652910>, <nflgame.game.Game object at 0x107652a10>, <nflgame.game.Game object at 0x107652b10>, <nflgame.game.Game object at 0x107652c10>, <nflgame.game.Game object at 0x107652d10>, <nflgame.game.Game object at 0x107652e10>, <nflgame.game.Game object at 0x107652f10>, <nflgame.game.Game object at 0x107d02050>, <nflgame.game.Game object at 0x107d02150>]

>>> games[0].home u'NE' >>> games[0].away u'PIT' >>>

>>> games[0].score_home 28 >>> games[0].score_away 21

Page 40: Sports Analytics in the Era of Big Data and Data Science

DATA SOURCES

>>> for i in games[0].drives: ... print i ... PIT (Start: Q1 15:00, End: Q1 09:40) Missed FG NE (Start: Q1 09:40, End: Q1 07:41) Punt PIT (Start: Q1 07:41, End: Q1 03:14) Punt NE (Start: Q1 03:14, End: Q2 11:11) Touchdown PIT (Start: Q2 11:11, End: Q2 08:38) Missed FG NE (Start: Q2 08:38, End: Q2 04:01) Touchdown PIT (Start: Q2 04:01, End: Q2 00:03) Field Goal NE (Start: Q2 00:03, End: Q2 00:00) End of Half NE (Start: Q3 15:00, End: Q3 10:37) Touchdown PIT (Start: Q3 10:37, End: Q3 06:43) Touchdown NE (Start: Q3 06:43, End: Q3 04:15) Punt PIT (Start: Q3 04:15, End: Q4 11:39) Field Goal NE (Start: Q4 11:39, End: Q4 09:20) Touchdown PIT (Start: Q4 09:20, End: Q4 08:29) Punt NE (Start: Q4 08:29, End: Q4 07:29) Punt PIT (Start: Q4 07:29, End: Q4 07:00) Interception NE (Start: Q4 07:00, End: Q4 02:59) Punt PIT (Start: Q4 02:59, End: Q4 00:02) Touchdown NE (Start: Q4 00:02, End: Q4 00:00) End of Game

Page 41: Sports Analytics in the Era of Big Data and Data Science

DATA SOURCESplays = nflgame.combine_plays(games) >>> for p in plays: ... print p ... (NE, NE 35, Q1) S.Gostkowski kicks 65 yards from NE 35 to end zone, Touchback. (PIT, PIT 20, Q1, 1 and 10) (15:00) De.Williams right tackle to PIT 38 for 18 yards (D.Hightower). (PIT, PIT 38, Q1, 1 and 10) (14:21) B.Roethlisberger pass short right to A.Brown pushed ob at PIT 47 for 9 yards (D.Hightower). (PIT, PIT 47, Q1, 2 and 1) (14:04) De.Williams right guard to NE 49 for 4 yards (J.Collins; M.Brown). (PIT, NE 49, Q1, 1 and 10) (13:26) B.Roethlisberger pass short right to H.Miller to NE 35 for 14 yards (J.Mayo). (PIT, NE 35, Q1, 1 and 10) (12:42) (Shotgun) De.Williams right guard to NE 24 for 11 yards (J.Collins). (PIT, NE 24, Q1, 1 and 10) (12:05) A.Brown sacked at NE 32 for -8 yards (M.Brown). (PIT, NE 32, Q1, 2 and 18) (11:20) (Shotgun) De.Williams right end pushed ob at NE 28 for 4 yards (D.Hightower). PENALTY on PIT-M.Gilbert, Offensive Holding, 10 yards, enforced at NE 32 - No Play. (PIT, NE 42, Q1, 2 and 28) (10:53) W.Johnson right guard to NE 36 for 6 yards (R.Ninkovich). NE-D.Easley was injured during the play. He is Out. (PIT, NE 36, Q1, 3 and 22) (10:28) (Shotgun) B.Roethlisberger pass short right to H.Miller to NE 26 for 10 yards (P.Chung; M.Butler). (PIT, NE 26, Q1, 4 and 12) (9:44) J.Scobee 44 yard field goal is No Good, Wide Right, Center-G.Warren, Holder-J.Berry. (NE, NE 34, Q1, 1 and 10) (9:40) (Shotgun) T.Brady pass short left to J.Edelman pushed ob at NE 47 for 13 yards (W.Gay). PENALTY on NE-N.Solder, Unnecessary Roughness, 15 yards, enforced between downs. (NE, NE 32, Q1, 1 and 10) (9:14) (Shotgun) T.Brady pass short left to D.Lewis to NE 44 for 12 yards (J.Harrison). (NE, NE 44, Q1, 1 and 10) (9:00) (No Huddle, Shotgun) T.Brady pass short left to D.Lewis ran ob at PIT 43 for 13 yards. (NE, PIT 43, Q1, 1 and 10) (8:31) (No Huddle, Shotgun) T.Brady pass incomplete short right to R.Gronkowski. (NE, PIT 43, Q1, 2 and 10) (8:27) T.Brady pass incomplete deep right to D.Amendola. (NE, PIT 43, Q1, 3 and 10) (8:22) (Shotgun) T.Brady sacked at PIT 43 for 0 yards (B.Dupree). (NE, PIT 43, Q1, 4 and 10) (7:48) R.Allen punts 36 yards to PIT 7, Center-J.Cardona, fair catch by A.Brown. (PIT, PIT 7, Q1, 1 and 10) (7:41) De.Williams left guard to PIT 13 for 6 yards (A.Branch; G.Grissom). (PIT, PIT 13, Q1, 2 and 4) (7:07) De.Williams left tackle to PIT 12 for -1 yards (C.Jones). (PIT, PIT 12, Q1, 3 and 5) (6:26) (Shotgun) B.Roethlisberger pass short left to A.Brown pushed ob at PIT 22 for 10 yards (D.McCourty). (PIT, PIT 22, Q1, 1 and 10) (5:54) De.Williams right guard to PIT 26 for 4 yards (R.Ninkovich). PENALTY on PIT-K.Beachum, Illegal Formation, 5 yards, enforced at PIT 22 - No Play. (PIT, PIT 17, Q1, 1 and 15) (5:29) (Shotgun) B.Roethlisberger pass short right to A.Brown to PIT 20 for 3 yards (J.Collins). (PIT, PIT 20, Q1, 2 and 12) (4:48) B.Roethlisberger sacked at PIT 14 for -6 yards (D.Hightower). (PIT, PIT 14, Q1, 3 and 18) (4:03) (Shotgun) B.Roethlisberger pass deep left to H.Miller to PIT 31 for 17 yards (D.McCourty; T.Brown). (PIT, PIT 31, Q1, 4 and 1) (3:25) J.Berry punts 50 yards to NE 19, Center-G.Warren. D.Amendola to NE 34 for 15 yards (V.Williams). PENALTY on NE-M.Slater, Illegal Block Above the Waist, 10 yards, enforced at NE 20. (NE, NE 10, Q1, 1 and 10) (3:14) D.Lewis left tackle to NE 18 for 8 yards (W.Allen). (NE, NE 18, Q1, 2 and 2) (2:40) D.Lewis up the middle to NE 19 for 1 yard (M.Mitchell). (NE, NE 19, Q1, 3 and 1) (2:05) T.Brady up the middle to NE 20 for 1 yard (L.Timmons; S.McLendon). (NE, NE 20, Q1, 1 and 10) (1:14) D.Lewis left end pushed ob at NE 25 for 5 yards (L.Timmons). PENALTY on NE-N.Solder, Offensive Holding, 10 yards, enforced at NE 20 - No Play. (NE, NE 10, Q1, 1 and 20) (:45) (Shotgun) T.Brady pass short left to A.Dobson to NE 19 for 9 yards (W.Gay). (NE, NE 19, Q1, 2 and 11) (:12) (Shotgun) T.Brady pass short left to J.Edelman to NE 28 for 9 yards (C.Allen). ….

Page 42: Sports Analytics in the Era of Big Data and Data Science

What does all this mean for me?

Page 43: Sports Analytics in the Era of Big Data and Data Science

Work = Fun

Page 44: Sports Analytics in the Era of Big Data and Data Science

BUT…

➤ Good understanding of fundamentals of statistics and probabilities

➤ Ability to work with APIs and data

➤ Python, R, MySQL

➤ Of course domain knowledge

Page 45: Sports Analytics in the Era of Big Data and Data Science
Page 46: Sports Analytics in the Era of Big Data and Data Science