33
Bank Shots to Bankroll Joseph DeLay, Adam Pescatore, Zach Meyer, Lucas Howes, Josh Danker The University of Iowa College of Liberal Arts and Sciences Abstract The goals of our research is to determine, based on win shares (WS) and player efficiency rating (PER), what the expected salary should be for each player in the NBA. We then want to take that number and compare it to what players were paid during the 2013-14 NBA season to see if players were paid what their statistics say they deserved, what teams were best at spending money in general, and if there is any information about NBA pay in general that can be learned from the data. Introduction The data we are using is from a spreadsheet from industry- leading analyst Nate Brixius’ blog, completed with statistics from the 2013-14 NBA season, such as games played, minutes played, field goals made, field goals attempted, and many more (Brixius). We simply added minutes per game, PER, and player salaries. We did not use every single NBA player in our data set. In order to be eligible for a PER, a player must have played an average of 6.09 minutes per game throughout the season, so that left us with 337 data points, or eligible players. We want to conduct an DeLay, Pescatore, Meyer, Howes, Danker 1

Bank Shots to Bankroll Final

Embed Size (px)

Citation preview

Page 1: Bank Shots to Bankroll Final

Bank Shots to BankrollJoseph DeLay, Adam Pescatore, Zach Meyer, Lucas Howes, Josh Danker

The University of Iowa College of Liberal Arts and Sciences

Abstract

The goals of our research is to determine, based on win shares (WS) and player

efficiency rating (PER), what the expected salary should be for each player in the NBA.

We then want to take that number and compare it to what players were paid during the

2013-14 NBA season to see if players were paid what their statistics say they deserved,

what teams were best at spending money in general, and if there is any information

about NBA pay in general that can be learned from the data.

Introduction

The data we are using is from a spreadsheet from industry-leading analyst Nate Brixius’

blog, completed with statistics from the 2013-14 NBA season, such as games played,

minutes played, field goals made, field goals attempted, and many more (Brixius). We

simply added minutes per game, PER, and player salaries. We did not use every single

NBA player in our data set. In order to be eligible for a PER, a player must have played

an average of 6.09 minutes per game throughout the season, so that left us with 337

data points, or eligible players. We want to conduct an analysis of WS and PER and

create a formula using those two variables and league average salary to determine what

players at certain levels of those statistics should be paid.

Now, this does not mean that we are looking to see if any teams are saving

money by paying players less than what they deserve. If we determine a specific player

deserves $8 million a year, but the team is paying him $6 million, then clearly the player

is being underpaid, so we would consider. We assume that there is a big difference

between the surplus that the overpaid players are receiving and the deficit for the

underpaid players, but we want to make sure we weed out the noise, or the salary

differences that are negligible based on a percentage of the league average salary,

DeLay, Pescatore, Meyer, Howes, Danker 1

Page 2: Bank Shots to Bankroll Final

before we start making analyses. We will now explain the two statistics we are using,

PER and win shares, as well as the NBA salary cap more in depth.

PER Explained

PER is an acronym for player efficiency rating. PER is an advanced statistic that measures

a player’s effectiveness on a per-minute basis, while taking into consideration the pace

at which a team plays. Created by statistician John Hollinger, PER is a formula that

includes but is not limited to “positive accomplishments such as field goals, free throws,

3-pointers, assists, rebounds, blocks, and steals, and negative ones such as missed shots,

turnovers and personal fouls” (Hollinger, 2011). PER is an effective statistic to use

because it can compare two players even if there is a significant minutes gap, meaning

one player is on the court significantly longer than another player. One flaw in PER is

that it cannot measure a player’s defensive efficiency. However, PER is useful because it

can “summarize a player’s statistical accomplishments in a single number” (Hollinger,

2011).

Win Shares Explained

According to David Corby of Basketball Reference, win shares is a statistic to “credit a

player’s total measurable contribution to his team’s win total during the season”

(Casciaro 2015). Unlike PER, win shares can measure both offensive and defensive

productivity from a player. Statistics on offense include field goals, assists, free throws,

and offensive rebounds that lead to points. However, defensive statistics are not

measured as easily as offensive statistics. One aspect of defense that is measured to

determine win shares is a “stop.” A stop is generally given to a player who gets a steal,

block or defensive rebound. Thus, when factoring in defense to win shares, the main

criteria measured is how often a team gets a stop, as well as the player who forces a

stop. One advantage win shares has over PER is that it measures the player’s total

productivity, instead of the player’s per minute productivity. In other words, win shares

tells us how much a player produces, and PER tells us how efficient a player is.

DeLay, Pescatore, Meyer, Howes, Danker 2

Page 3: Bank Shots to Bankroll Final

Salary Cap Explained

Since the statistics used in our research are from the 2013-14 NBA season, the salary

cap we use will also be from that season because the salary cap changes every year.

According to senior NBA analyst Sekou Smith, the salary cap for the 2013-14 NBA season

was $58.679M (Smith). Furthermore, the minimum a team was required to spend was

$52.811M, or 90 percent of the salary cap. However, the salary cap is not necessarily the

maximum a team can spend on players. The maximum a team can spend without a

penalty is called a tax level, and it was set at $71.748 million dollars. However, if a team

exceeds the tax level, they will have to pay the NBA the following fees (provided by

Sekou Smith):

• Portion of team salary $0-$4.99 million over tax level: $1.50 for $1

• Portion of team salary $5-$9.99 million over tax level: $1.75 for $1

• Portion of team salary $10-$14.99 million over tax level: $2.50 for $1

• Portion of team salary $15-$19.99 million over tax level: $3.25 for $1

• Rates increase by $0.50 for each additional $5 million of team salary above the tax

level.

For example, if a team exceeds the tax level by 2 million dollars, they must pay the NBA

a fee of 3 million dollars. As long as they stay under the tax level, if a team exceeds the

initial salary cap at $58.679 million dollars, there is no penalty. Conversely, if a team

exceeds the tax level, they will have to pay a fee explained in the above bullet points.

DeLay, Pescatore, Meyer, Howes, Danker 3

Page 4: Bank Shots to Bankroll Final

Background

Throughout our project we used a concept called Topological Data Analysis. This is a

mathematical technique that involves creating simplicial complexes. We represent

data as a simplicial complex in order to discover its topological attributes. A

simplicial complex is a type of graph comprised of points, edges, triangles, faces,

tetrahedrons, etc. as seen in figure 1. One way to create these edges is based upon

how close the points are to each other using a Euclidean distance. There are many

other kinds of ways of saying how “close” points are to each other, but we won’t get

into those seeing as we did not use any of those methods. We use epsilon balls to

determine if points are close to each other. By this I mean if another point is within a

distance of radius, epsilon we connect the points with an edge. Epsilon is just a fancy

word for a number that we can change or choose. So having a radius of epsilon is no

different than having a radius of 1, we just choose epsilon to equal 1. We can then

use these edges we just created to similarly create faces. To create a face all three

edges need to be connected and there needs to be a common area of intersection in

all 3 points epsilon ball. If all three points’ epsilon balls don’t overlap then there is a

hole. For example, in Figure 2, the left complex would have a hole because the 3

epsilon balls do not intersect in the middle. However the right figure would be filled

in, creating a face, because they all overlap in the center. The simplicial complex for

the example on the left would be a triangle with white in the middle, a hole. While

the example on the right would be colored in, creating a face, there would be no

hole.

DeLay, Pescatore, Meyer, Howes, Danker

Figure 1http://inperc.com/wiki/index.php?title=Simplicial_homology

4

Page 5: Bank Shots to Bankroll Final

In topological data analysis many inferences can be made depending on the number

of holes and the location of these holes compared to the rest of your data in the

graph. Making inferences based upon holes works in higher dimensions as well not

just for two dimensional triangles and faces. This way of

creating simplicial complexes is called the Čech Complex. Another style of

simplicial complex building is called the Vietoris-Rips complex.

Vietoris-Rips Complex 

This type of simplicial creation does not involve creating epsilon balls and needing a

common intersection to fill in a hole. With this style, if the edges are connected, you

fill in the complex. You fill in all complexes of dimension greater than one. This is

especially important when building our persistent diagrams and barcodes because

filling once the cycles become connected the cycle then dies. This allows us to find

long lasting cycles and determine their importance. There are some drawbacks

though, for example, in figure 2, both simplicial complexes would be identical and

they would be filled in. Topology has some loss of precision but we can still learn a

lot from it. The reason for doing this will be explained more when discussing the

study of homology.

DeLay, Pescatore, Meyer, Howes, Danker

Mimicked from Power Point 1 slide 23From Isabel Darcy.

Figure 2

5

Page 6: Bank Shots to Bankroll Final

Homology

Homology studies and compares manifolds, their triangulations, simplicial

complexes, and holes. The last two complexes are used to determine homology.

Homology also has to do with a collection of edges being cycles which is defined as,

if you can travel around the edges and get back to the same point. In homology we

are dealing with Z mod 2 coefficients. This means that all the values are either 0 or

1. So if you add two edges with 1 mod 2 together then the value equals 0 mod 2.

Homology is equal to cycles mod boundaries. This method can tell whether a certain

collection of edges is a circle, a face, a torus, or a ball. While homology deals with a

collection of edges, persistent homology goes further with this idea by telling us

how long these collection of cycles exist.

Persistent Homology

Persistent homology has a lot of the same characteristics as homology. Instead of

using a single distance for our epsilon balls, we greatly enlarge our epsilon balls to

determine whether certain cycles persist throughout time. These persistent cycles

tell us the importance of the shapes and holes by showing us how long each cycle

lasts. With our data, we are trying to determine which of these cycles are noise and

which are important to us. We will determine this by using two methods, barcodes

and persistent diagrams.

Barcodes

Barcodes are used to help researchers better understand clusters of certain dimensions

in their data. They are based upon when a cycle starts and when it gets close enough to

another cycle to become part of that cycle. This absorption is called the death of the

cycle. A barcode is created on an axis by taking the time the cycle is created and creating

a line that stretches from the time the cycle began to the time it dies. An important

note about creating a barcode is that it is based upon a filtered complex. This means

that time determines when each point and edge comes into existence because we look

at the time when gradually growing our epsilon balls in steps. When the epsilon ball

DeLay, Pescatore, Meyer, Howes, Danker 6

Page 7: Bank Shots to Bankroll Final

intersects with another, the younger of the two cycles dies. Because of this fact we can

tell when cycles start and end which can tell us how important they are. This coupled

with persistence diagrams can tell us a lot about our data. The three barcodes we

computed are shown on the next page.

Barcodes from Perseus-

Barcodes From R- We attempted to make an H2 but we had an insufficient memory

error and could not make it.

Persistent Diagrams

DeLay, Pescatore, Meyer, Howes, Danker

H0 Barcode H1 Barcode H2 Barcode

7

Page 8: Bank Shots to Bankroll Final

A persistence diagram is used similarly to barcodes in the way that they both involve the

starting and ending of cycles and that they are both visual representations of the data

set. However persistence diagrams are more of a graph because it maps the birth time

on the x-axis vs. the end time on the y-axis. There is also a line drawn as y=x. This line on

the graph can help us determine which cycles are noise because most of the cycles that

are close to this line are considered noise. The reason we think much of this is noise is

because the starting time is so close to the ending time of the cycle that most times

these cycles are not important. There are exceptions when short cycles can be

important, but we will not get into those in this report. It is important to note that

persistence diagrams and barcodes are just two ways of visualizing the same

information.

H0 Explained

H0: Studying a data set’s H0 is very important because it can tell us how many

components there are. When doing homology and not persistent homology this can

really allow us to see if there are two different components living in our data set

based upon its rank. However, when doing H0 in the barcode when we let the

growing of the epsilon balls go on indefinitely there will always be just one

connected component. Therefore one long line, however you can still see other long

lines compared to the rest. This will tell us that this component persisted for a long

time before becoming connected to the other component. This is where some of the

ambiguousness comes from in Topological Data Analysis, in trying to judge which

lines are important and which are not. For most data sets it is really clear to tell

which is important and which is not through the combination of the barcode and

persistent diagram. For example if our data set was figure 3 we would see two long

bars for a very long time because the distance between the two circles is so big.

When talking about what these H0 through H3, this number comes from cycles that

do not have any boundaries. Actually to calculate our Hn values we take our cycles

and mod them by the boundaries. This is synonymous with how many connected

components there are for H0 because we consider a point to be a 0-dimensional

cycle. We use this is in many of the higher dimensions to figure out large holes in the

DeLay, Pescatore, Meyer, Howes, Danker 8

Page 9: Bank Shots to Bankroll Final

data that last for a long time. We can learn a lot from realizing which cycles do not

bound a surface.

R Explained

R is a software tool that allows for programming to analyze data. R has the

capabilities of creating histograms, pie charts, box and whisker plots, barcodes,

scatterplots, and persistent diagrams. It is very helpful and something we used to

help create the barcodes for us. R is a culmination of all the ideas we have been

trying to put together. For example R uploads our set of 337 data points which is in

R^3, meaning we have three different variables we are testing on, and takes their

Euclidean distance to calculate when the points connect for H0. It also does this for

H1 calculating the distances until 1-dimensional cycles are formed. R is a script-

based language where you can make commands to accomplish tasks. R is also a free

software program, which is worked on by many people to make improvements and

libraries. A library has a list of functions and applications you can use in it. For

example, we used the TDA and PHOM library to make the barcodes. R is a very

useful resource because to do this by hand in higher dimensions would be

impossible.

DeLay, Pescatore, Meyer, Howes, Danker

Figure 3http://aninfopage.blogspot.com/

9

Page 10: Bank Shots to Bankroll Final

Results

This graph shows us every player who is eligible for a PER (minimum 6.09 minutes per

game) and their relation to a line that goes through the average salary, PER, and WS.

There is a different color set for each team so we can tell if any team has more of

their players above or below the line. We have established that the players farthest

from the origin while still under this line are the players who are more efficient and

DeLay, Pescatore, Meyer, Howes, Danker

Figure 4

10

Page 11: Bank Shots to Bankroll Final

lower cost to their team. On the contrary, it shows that if a player is anywhere above

the line, we feel that his efficiency and/or WS is not worth the salary he is paid. We

noted a few exemplary points like LeBron James, Kevin Durant, and other notable

players to show where they stand on this graph. This diagram is important because it

offers a clear visual and combined with the spreadsheet of player statistics we can learn

a lot about each team.

Analysis of H0

The H0 barcodes as well as the persistence diagrams tell us a lot about the shape of

our data. There are many early deaths, and deaths happen far less frequently the

more steps that are taken. This tells us that there are lots of areas with many data

points crowded together, but these areas aren’t necessarily very close to each other.

This makes sense as player’s salaries definitely tend to form into clusters as many

play for the league minimum, then there is a veterans minimum, so many data

points would have identical values in at least one of their dimensions. Moreover,

most of these players won’t play much so their PER and Win shares will also be

close to each other, near zero, so these points start out extremely close. Close points

will soon make cycles in H0, so it explains why there are so many early deaths.

Analysis of H1

DeLay, Pescatore, Meyer, Howes, Danker

H0 Persistent Diagram H1 Persistent Diagram

11

Page 12: Bank Shots to Bankroll Final

H1: We found that there are 5 cycles that persist indefinitely. This can be explained

by not having such a large gap in players’ abilities and salaries. Not only that

because we need these cycles to not bound anything. So not only do we need a large

gap between certain players but we also need there to be not many other players in

between them. This can be explained by having a few players having really good

PER and WS with poor salaries compared to players with really high salary and poor

stats. A cycle like this would explain for a long time and there would not be a bunch

of players directly through this cycle. Two more cycles could be explained with the

aforementioned groups compared to the players doing well in all regards, high

salary, PER, and WS. The first two types of players described combined with the

players who do poorly in stats and get paid poorly would explain the last two

persisting cycles. We do not expect a lot of players to fall in between the really good

players getting paid well and the players who play well and do not be paid a lot.

Similarly we do not expect there to be a lot of players in between the really good

players getting a lot with the players getting paid a lot who play poorly. This logic

also applies to the players who do poorly and get paid poorly. Using the 3-

dimensional cube of players we can see that there are not a lot of players in the

scenarios described therefore not allowing there to be a surface there. This causes

these cycles to persist indefinitely. This is the type of data we expect to see with our

hypothesis. We expect there to be a large misdistribution of money compared to

statistics. This shows us that there are teams who are getting a really good deal on

players and some teams who are misusing their money. If all of the teams were

correctly using the money for the team we would expect there to be no one

dimensional cycles because the majority of the points would fall inside a diagonal

cylinder. This would cause the cycles to bound a surface, the points in the cylinder,

and therefore have no cycles in H1.

DeLay, Pescatore, Meyer, Howes, Danker 12

Page 13: Bank Shots to Bankroll Final

Explanations of H2 and H3

H2: In our H2 data we concluded that there are no important cycles because they all

ended relatively close to when they were formed. This is not necessarily a bad thing,

this just means that all of our 2-dimensional cycles bound a surface. We can not

gather any more information from our H2.

H3: We tried to give an appropriate analysis but from the H3 persistent diagram we

could not come up with any valid conclusions of why there would any 3-dimensional

cycles that do not bound any surfaces. We especially could not figure out why there

would be only be 3-dimensional cycles that do not bound a surface. However, we did

find what we were looking for in our H1.

Software Used and creation of data:

To help us better understand our research, we used a variety of different software

packages to help us understand our data. The input for all the data was a normalized

data set, with a maximum value of 1 and a minimum value of 0. It was a three

DeLay, Pescatore, Meyer, Howes, Danker

H2 Persistent Diagram H3 Persistent Diagram

13

Page 14: Bank Shots to Bankroll Final

dimensional data set, containing PER, Win Shares, and Salary. The software that

allowed us to compute the barcodes, as well as the data for the persistence

diagrams, was Perseus. Perseus is a software that computes the persistent

homology of a set of data after taking a scaling factor, step size, and number of steps,

dimension of the data, and the data itself as input. To create the persistence

diagrams, we used a Matlab script called persdia, which came bundled with the

Perseus program. Taking the birth and death times from Perseus, I wrote my own

program using the Python turtle to draw the barcodes for our group. Turtle is just a

library meant to help introductory programs understand programming in general,

but it was functionalities that allowed it to be able to draw the barcodes, so we used

it, as the barcodes are one of the most revealing things about a data set, and

somewhat easier to interpret than birth vs. death times graphs. The final software

used was the Mapper, a software that reveals clusters within data. It is what

produced the clustering diagram and created the 3-d cube of points.

R is a software tool that allows for programming to analyze data. R has the

capabilities of creating histograms, pie charts, box and whisker plots, barcodes,

scatterplots, and persistent diagrams. It is very helpful and something we used to

help create the barcodes for us. R is a culmination of all the ideas we have been

trying to put together. For example R uploads our set of 337 data points which is in

R^3, meaning we have three different variables we are testing on, and takes their

Euclidean distance to calculate when the points connect for H0. It also does this for

H1 calculating the distances until 1-dimensional cycles are formed. R is a script-

based language where you can make commands to accomplish tasks. R is also a free

software program, which is worked on by many people to make improvements and

libraries. A library has a list of functions and applications you can use in it. For

example, we used the TDA and PHOM library to make the barcodes. R is a very

useful resource because to do this by hand in higher dimensions would be

impossible. To compare with the program I wrote that did the barcodes, we plotted

the barcodes using R. The results were very similar, the only differences being my

barcodes that lasted infinitely where the ones plotted by R didn’t, and R somehow

specifying a starting radius, as we never chose one for the R program.

DeLay, Pescatore, Meyer, Howes, Danker 14

Page 15: Bank Shots to Bankroll Final

Additional information on use of software

It’s important to note certain settings on the software packages used and

why these settings were used. Changing any of these settings even a small amount

would completely change the output. Most of the settings fell into somewhat of a

sweet spot determined by testing. For example, for the step size in Perseus we tried

1000, 20, 75, and many other step sizes until we found what we determined gave an

output that revealed the most about the data. For the Perseus software, a scaling

factor of .03, a step size of .01, a radius of .2 and 150 total steps were the settings for

the software. These were used for a number of reasons. A scaling factor of .03 was

used as it shrunk the size of each of the points considerably, which was necessary as

many were extremely close to begin with, so their epsilon balls would intersect

immediately, destroying a cycle before a single step was taken. The step size of .01

was used for much of the same reason, it was relatively small and because much of

the data started so close together it had to be small as to allow the data points to die

more gradually instead of all at once. The starting radius of.2 seemed to be the

sweet spot for producing a good output, if it was much lower the output would

contain many infinite cycles, and if it was much larger to begin with the points

would die as soon as they were born. 150 steps were used as it produced the best

shape of data in comparison with step numbers higher or lower. Most of the infinite

cycles probably would have intersected had we went up to say, 1000 steps, but this

was much harder on the software itself, and it crashed almost every time I tried step

numbers that high. For the settings on Python Mapper we used the default values on

the GUI, with the only change being the Cover, which I switched to a balanced 1-d

cover over uniform 1-d, and changing the clustering setting to complete over single.

The reason we changed the cover was simply because it produced results we

thought were easier to interpret, and I thought the balanced 1-d cover made slightly

more sense with our data, as it had some areas where data points were heavily

crowded. We changed the clustering to complete as I thought it would be more

important that the entire data set was close to the entire other data set, which it

much better achieved by clustering based on furthest points versus closest points.

DeLay, Pescatore, Meyer, Howes, Danker 15

Page 16: Bank Shots to Bankroll Final

Discussion

When our group considered how to determine what each players expected salary

should be, we took it in terms of what we thought were the two most important, stand-

out statistics, PER and win shares. In our spreadsheet we had with all players’ statistics,

we created a formula that weighed each player’s normalized PER and win shares against

the league average, and then we simply multiplied that number by the league average

salary. This is the formula we used:

SQRT((PER1/AvePER)*(WS1/AveWS))*AveSal

Where:

PER1 Player’s normalized PER

AvePER Normalized league average PER

WS1 Player’s normalized WS

AveWS Normalized league average WS

AveSal League average salary

We want to give a few examples of players whose salaries stood out to us in our

data. We have calculated each of these players expected salaries based on the weighed

PER and WS formula we used in the spreadsheet. Now, while these salaries we

computed were based on what players should have been paid for the 2013-14 season,

the statistics we have are from that season, so the numbers we have are more of a

“what the recorded statistics from the season would have been worth if they were

accurately predicted.” We will give examples of the most overpaid and underpaid

player, the most deserving player for the league’s highest salary, the most accurately

paid2, the most deserving for the league’s lowest salary3, and the most deserving for

league average4.

DeLay, Pescatore, Meyer, Howes, Danker 16

Page 17: Bank Shots to Bankroll Final

Category Player Real salary-expected salary1

Most overpaid Amar’e Stoudemire $18,162K

Most underpaid Isaiah Thomas -$8,743K

Deserving of highest salary Kevin Durant $415K

Most accurately paid2 Luis Scola $765

Lowest deserved salary (2)3 Tony Wroten, Jae Crowder $1,160K, $788K respectively

Deserving of average salary4 Derrick Favors $937K

1. Real salary-expected salary is our statistic to show the difference in what players are

being paid versus what we believe they should be paid.

2. The most accurately paid (Luis Scola) is based on the smallest difference between real

salary and expected salary.

3. The lowest deserves salaries (Tony Wroten, Jae Crowder) are based on, since our data

is normalized, the player with the league’s lowest PER (Wroten) and the league’s lowest

WS (Crowder), which will both show up on our data as zero, so our calculations show

they technically deserve a salary of zero.

4. The player who most deserves the average salary (Favors) is based on the player who

is closest to the league’s average PER and the league’s average WS.

Validation

There are many factors that teams use when determining a contract to offer a

player. Since we decided to use PER, average salary and win shares, we were able to

generate a formula to produce an expected salary. We feel this formula is an

accurate depiction of how much players deserve to be paid because it considers

what we believe are the most useful statistics. However, others who examine how

much each player is worth would likely use different formulas and methods to come

up with different salary figures for each player because they may value other

statistics differently. For example, Mike Ghirardo of California Polytechnic State

DeLay, Pescatore, Meyer, Howes, Danker 17

Page 18: Bank Shots to Bankroll Final

University created a different method to analyze how much a player should be paid.

He chose to compare salary to Adjusted Plus-Minus (APM), which measures and

compares how a team does when a particular player is on or off the court. However,

after analyzing his report we found his method to be flawed (and ours to be more

effective) because he believes his method has a lot of noise, and we agree.

Of course, every data set will have noise, but we expect the data to have less

than ten percent noise. Ghirardo’s data may have had a lot of noise because he was

only using one basic statistic to determine a player’s salary. Ghirardo believes his

data had a lot of noise because of the co-linearity between players, meaning one

player could play the majority of his minutes with a specific group of players. This

would lead to a lot of useless data because many players would have similar APM.

We found our data to have little noise, which is why we rendered our method to be

more effective. Our method focuses more on the individual instead of the team,

which gives us more relevant data when determining a player’s salary. We feel this

is due to using two advanced statistics, as well as a basic statistic.

Minor Validation Issue

We ran two different types of software in our research to double check our results.

We used both R/Phom and Perseus/Mapper but instead of finding the same

barcodes like we expected there was a slight difference. In our H1 on the R barcode

we were missing the 5 infinitely lasting cycle. This is a large difference in our results

because this is what reassures of our hypothesis. However, we believe our Perseus

results to be the correct ones because Perseus is better at handling bigger data sets.

Our H2 did not even work with R because we had an insufficient memory error. We

do still believe in our results but this is just a disclaimer that this happened

Limitations

We were not able to meet all of our research goals. After using the collecting the

data and using Perseus it became apparent that we would not be able to learn

anything about how well specific teams in the NBA manage their money, as Perseus

DeLay, Pescatore, Meyer, Howes, Danker 18

Page 19: Bank Shots to Bankroll Final

doesn’t allow you to identify specific data points. Without knowing what data points

were associated with what cycles, we decided it would be impossible to analyze

players or teams individually using topology.

Conclusion

Contrary to popular belief, our data shows that of the 337 players we analyzed who

were eligible for a PER, only 129 of them were overpaid by our standards. That leaves

over 60% of the league (again, only including players who were eligible for a PER,

otherwise there are about 440 players all together) who were paid less than their

statistics say they deserved during the 2013-14 NBA season. However, while the number

of underpaid players almost doubled that of those overpaid, the amount by which the

overpaid players surpassed their expected salaries is much higher than the difference

between real and expected salaries for those who were underpaid. Some prime

examples of these players are Amar’e Stoudemire, Carmelo Anthony, Dwyane Wade,

Russell Westbrook, and many others. What these players share in common is their

marketability. What I mean by that is that these players have become public figures

throughout their respective careers, so many times fans go to see them play solely

because they want to be able to say that they saw that player play in person. This

information can be helpful to NBA front offices that need to determine how they can

save money on one player to make room in the salary cap for other players. Also we

learned that over half of all NBA contracts are very similar for the value in relation to

performance. There are several groups of outliers whose value is completely different

from what is expected based on their performance. These contracts are likely the most

important decisions an organization makes as it either has found tremendous value in

relation to the contract, paid a premium for a premium player, or wasted its money on

an underperforming player. The fact that there are outliers at all proves that talent

evaluation is far from perfect in the NBA, and that there are likely some teams that are

better at it than others. Also, with a group of only 8 players who were paid and played

like superstars, and 30 teams in the league, teams should probably focus on getting

great draft picks for their cheap contracts, and avoid overpaying middling players. With

DeLay, Pescatore, Meyer, Howes, Danker 19

Page 20: Bank Shots to Bankroll Final

only an 8/30 chance of having a properly paid superstar, teams should focus on winning

without a superstar, as most of them won’t have a choice.

Contributions

Zach contributed the abstract (edited by Joey), introduction (edited by Joey), math

background, vietoris-rips complex, homology (edited by Josh), persistent homology

(edited by Josh), barcodes (edited by Lucas), persistence diagrams (edited by Lucas),

H0 explained, explanation of H1-H3 persistence diagrams, and the 3D scatterplot,

Minor Validation Issue, R explained, and Barcodes from R.

Lucas contributed the barcodes, analysis of H0, and the explanations for all software

we used.

Adam contributed explanations for PER (edited by Joey), salary cap, and win shares,

and the validation.

Joey contributed the discussion, conclusion, and the explanation for the 3D

scatterplot.

DeLay, Pescatore, Meyer, Howes, Danker 20

Page 21: Bank Shots to Bankroll Final

References

"3d Scatterplot for MS Excel." Doka.ch. Doka Life Cycle Assessments, n.d. Web. 10

Apr. 2015.

Brixius, Nathan. "NBA Rosters and Team Changes: 2013-2014." Nathan Brixius.

Wordpress, 08 Oct. 2014. Web. 24 Mar. 2015.

Casciaro, Joseph. "Get To Know An Advanced Stat: Win Shares." TheScore. N.p., 12

Feb. 2015. Web. 31 Mar. 2015.

Ghirardo, Mike. "NBA Salaries: Assessing True Player Value." Calpoly.edu. Digital

Comons, n.d. Web. 20 Apr. 2015.

Hollinger, John. “What Is PER?” ESPN. ESPN Internet Ventures, 11 Aug. 2011. Web.

26 Mar. 2015.

Smith, Sekou. "2013-14 NBA Salary Cap Figure Set at $58.679 Million." NBA.com

Hang Time Blog. NBA, 9 July 2013. Web. 07 Apr. 2015.

http://www.r-bloggers.com/topological-data-analysis-with-r/

Weisstein, Eric. "Homology." Wolfram Mathworld. Wolfram Alpha, n.d. Web. 24 Mar.

2015.

DeLay, Pescatore, Meyer, Howes, Danker 21

Page 22: Bank Shots to Bankroll Final

R code-

> NBA.Graphs <- read.csv("/mnt/nfs/netapp2/students/zmmeyer/Downloads/NBA Graphs.csv", header=FALSE) > View(NBA.Graphs) > data <- data.matrix(NBA.Graphs,rownames.force=NA) > library("TDA") Loading required package: FNN Loading required package: igraph Loading required package: parallel Loading required package: scales > library(phom) Loading required package: Rcpp > head(data) V1 V2 V3 [1,] 29.90 19.2 17832 [2,] 29.40 15.9 19067 [3,] 26.97 14.3 14693 [4,] 26.54 10.4 5375 [5,] 26.18 7.9 4916 [6,] 25.98 12.2 18668 > max_dim <- 0 //dimension =0> max_f <- 1 //display/x-axi=1> bball <- pHom(data, dimension = max_dim, max_filtration_value=max_f,mode= "vr", metric = "euclidean") > plotBarcodeDiagram(bball,max_dim,max_f,title="H0 of Stats vs Salary")//switched to normalized data > NORMD <- read.delim("/mnt/nfs/netapp2/students/zmmeyer/Downloads/NORMD.txt", header=FALSE) > View(NORMD) > data <- data.matrix(NORMD,rownames.force=NA) > bball <- pHom(data, dimension = max_dim, max_filtration_value=max_f,mode= "vr", metric = "euclidean") > head(data) V1 V2 V3 V4 [1,] 0.7817508 1.0000000 1.0000000 0.2 [2,] 0.8368823 0.9797325 0.8358209 0.2 [3,] 0.6416231 0.8812323 0.7562189 0.2 [4,] 0.2256596 0.8638022 0.5621891 0.2 [5,] 0.2051694 0.8492096 0.4378109 0.2 [6,] 0.8190706 0.8411026 0.6517413 0.2 > plotBarcodeDiagram(bball,max_dim,max_f,title="H0 of Stats vs Salary")

> max_dim<-1> max_f<-1> bball <- pHom(data,dimension=max_dim,max_filtration_value=max_f,mode="vr",metric="euclidean") > plotBarcodeDiagram(bball,max_dim,max_f,title="H1 of Stats vs Salary")

attempted to do H2 but I got a insufficient memory error.

DeLay, Pescatore, Meyer, Howes, Danker 22