April Madness

Preview:

Citation preview

April 20th, 2016Raunak Mundada

Peter WendelAddie Olson

Outline

• Question of interest and background

• Data Cleaning

• Seed prediction

• Bracket adjustment

• NCAA tournament predictions

• Results

• Conclusion

Question of Interest

• How would’ve SMU performed in NCAA division 1 Men’s basketball championship in 2016?

• Predict the seed for SMU

• Adjust the 2016 bracket for the tournament and predict the result of each match

• Simulate entire tournament and generate odds estimation for SMU

Background

(NCAA) Men's Division I Basketball Tournament• Single-elimination tournament played each spring in the United States, currently

featuring 68 college basketball teams, to determine the national championship of the major college basketball teams.

• The NCAA sanctioned the SMU men's basketball program for multiple violations, the penalty kept SMU from the 2016 postseason tournament.

• SMU Men’s Basketball was the last undefeated team to lose this season

• The team spent most of the season in the top 25, peaking at #8 in the country

Data Cleaning

Raw Data for each season

Data source: https://www.kaggle.com/c/march-machine-learning-mania-2016

Wteam ID Wscore ... Lteam ID Lscore ...

1374 75 ... 1106 70 ...

Team ID AvgScore ...

1374 73.2 ...

1106 75.8 ...

WteamID WAvgScore ... Lteam ID LAvgscore ...

1374 75.8 ... 1106 73.2 ...

Team ID TeamAvgScore ... Team1Won

1374 75.8 ... 1

Team ID TeamAvgScore ... Team1Won

1374 73.2 ... 0

Team1 ID Team2 ID ScoreRatio ... Team1Won

1374 1106 75.8/73.2=1.03 ... 1

÷

1. Raw data with winning and losing team game stats for each matchup in the season

2. Calculate season average stats for each team

3. Bind season average score to each team for each matchup in the previous tourneys

4. Convert win to binary variable based on previous March Madness results

5. Use ratios of team average statistics to use as regressors

Data set after preprocessing

Seed Prediction & Bracket Adjustment

Decision Tree for Seeding● Used 2003 - 2014 regular

season data to build decision tree to predict seed with a max depth of 5

● Tested on 2015 and 2016 regular season data

SMU’s seed estimate: 6

Training Data Test Data

• Found and eliminated least qualified team that did not win conference (Tulsa, 11 seed)

• Insert SMU as a 6 seed

• Bump least qualified 6 seed to 7

• Repeat until least qualified 10 seed takes Tulsa’s initial position as an 11 seed

• Use data from actual play-in games to select winners

Bracket Adjustments

• Found and eliminated least qualified team that did not win conference (Tulsa, 11 seed)

• Insert SMU as a 7 seed

• Bump least qualified 7 seed to 8

• Repeat until least qualified 10 seed takes Tulsa’s initial position as an 11 seed

• Use data from actual play-in games to select winners

Bracket Adjustments

NCAA tournament predictions

Model Building Process

Model Building ProcessModel Accuracy

-TrainingAccuracy -

TestSensitivity -

TrainingSensitivity -

TestingSpecificity -

TrainingSpecificity -

Testing

Logistic Regression

67.3% 71.6% 66.5% 66.2% 68.2% 77.3%

Logistic Regression - Significant Features

68.8% 65.7% 60.34% 54.4% 77.5% 77.3%

Penalized Logistic

Regression

68.3% 70.9% 66.3% 71% 71% 72.3%

Random Forest

69% 65.7% 69.3% 63.2% 68.7% 68.2%

Logistic Regression - Significant FeaturesVariable Importance

Logistic Regression Details

Team 2 Wins Team 1 Wins

Team 2 Wins 45 15

Team 1 Wins 23 51

Confusion Matrix

Pre

dict

ion

Reference

• Model output on test data set• Test dataset includes season

average from 2013, 2014 and 2015• The outcome variable corresponds

to the output for 2013, 2014 and 2015 NCAA March madness match-ups

AUC = 74.3%

Simulation Process

SMU Predicted Performance

The graph is read as follows –

• When SMU is seed 6, 1% times it reaches the sweet 16 round

In that sense, SMU has 0.5% chance of winning the tournament, when seeded 7.

Round 1

Round 2

Sweet 16

Elite 8

Final 4

Championship G

ame

Champions0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

42.0%

55.5%

1.0% 1.0% 0.5% 0.0% 0.0%

10.5%

66.5%

9.5% 7.5% 4.0% 1.5% 0.5%

SMU Predicted Performance for NCAA 2016

SMU Seed 6SMU Seed 7

Exit Round

Tota

l Vis

its

SMU Predicted Performance

The cumulative probability graph tells us that SMU, when seeded 6, probability of making it to at least the Elite 8 is 1.5%.

Round 2 Sweet 16 Elite 8 Final 4 Championship Game

Champions0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

89.5%

23.0%

13.5%6.0%

2.0% 0.5%

58.0%

2.5% 1.5% 0.5%0.0%

0.0%

Cumulative Probability of SMU Advancing to Each Round

SMU 7 Seed SMU 6 Seed

Comparison with Vegas probabilities

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

15.38%14.29%

12.50%

8.33% 8.33%6.25% 5.56% 5.56%

0.00%

13.0%

27.0%

14.5%

3.5%4.5%

0.0%

20.5%

2.5%

0.0%

13.50%

22.00%

17.00%

4.50% 4.50%

0.00%

21.50%

2.00%1.50%

Championship Probability for top 8 teams & SMU

Vegas Probability

Probability - SMU Seed 6

Probability - SMU seed 7

Overall, the model in some cases overestimates while in some, it underestimates the probability of winning the championship for teams.

The model underestimates the probability of winning for Villanova (the ultimate winner of NCAA March Madness 2016)

Round1

Round2

Sweet 16

Elite 8

Final 4

Championship G

ame

Champions0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

6.0%7.5%

26.5%

19.5% 18.5%

7.5%

14.5%

10.5%

22.0%

26.5%

22.5%

8.5% 7.5%

2.5%

North Carolina and Villanova Predicted Performance

North Carolina Villanova

Predicted performance for the finalists

• North Carolina and Villanova were the finalists

• Our model gives a 14.5% chance of winning to North Carolina

• However, the winner was Villanova (2.5% probability of winning according to our model)

Feature Importance across models

Conclusion

According to our seeding model and bracket simulator, SMU would have had 1.5% probability of winning the championship had they been eligible for postseason play. As the tournament goes on, the probabilities dwindle, but the ever present possibility of an upset and their strong regular season performance would have made the Mustangs a formidable opponent for any team in any round.

Further Questions -

• Further investigation into prediction of seeding

• Simulate more brackets of different seedings, readings

• Investigate independence of games and include player information in the model

• Include past march madness results

Thank you