NBA Salary Prediction Edbert Puspito Link to codes
Imagine You are Lakers GM The team are it worst now, 16-65, last place in Western conference. Kobe will retire, a bunch of player will have their contract expired. You definitely need to rebuild, or L.A. s top line from ticket sales and other merchandise will drop hard.
Imagine The salary cap of 16-17 season is projected to be 89 Million. And the remaining contract totals in 26 Million. Which means you have 63 Million free before You hit the salary cap.
Your challenge Who to re-sign? Who to target at free agency? How much should you pay them?
Data science to the rescue We created a model to predict a player salary based ONLY on their on-court performance. Find out who is overpaid, who is underpaid if you only consider their salary. Find out which team s GM is the best.
Hypothesis There are many factors that can affect the players salary 1. Performance. 2. Ability to attract fans. 3. Market demands 4. Luck (hype) 5..etc We assume performance to be able to explain majority of their salary. 2,3 and even 4 are also tied to 1
Salary trivias (or not so) Highest Salary ever: 33M, MJ, 97-98 season. The closest to the basketball god: 30 Million, KB24, 12-13, 13-14 season. In 15-16 season, at least 10 players have salary > 20M Average : ~5 M Median: 2.5 M Many player are underpaid, others overpaid
Dataset 4 seasons from 2012 to 2016. Statistic from NBA.com, including player bio, basic and advanced stats. Salaries were taken from ESPN.com, and adjusted for inflation. Total of 1600 data and ~50 features.
Some graphs
Some graphs
Straight ball or Curve ball? Base - linear : 0.596 Ridge - poly : 0.604 ElasticNet - poly : 0.581 Random Forest : 0.654 Extra trees : 0.667
Economic data We added the data of official players twitter followers and the team ticket sales. And the score goes up to 0.608 (0.699 in random forest regressors). This did indicate that popularity did affect the players salaries, but we focus on performance (due to the small amount of popularity data that can be crawled)
So, what model we use? The forest of forest 100 Extra trees models. Each extra trees have min leaf of 3, depth of 12, 50 estimators. And different sample of training data (70% sampled randomly) Score can range from 0.64 to 0.68
So, what model we use? The forest of forest 100 Extra trees models. Each extra trees have min leaf of 3, depth of 12, 50 estimators. And different sample of training data (70% sampled randomly) Score can range from 0.64 to 0.68
Findings:
Findings:
Findings:
Findings: As the boss of Lakers, Kobe indeed have all the means to make his statistic beautiful. And he is really really famous.
Findings: Overrated? Maybe, as the model only consider performance. What surprising is, despite all the tickets sales he raised, he is just overpaid by 1.5 M, suggesting he may be underpaid performance wise.
Findings: FYI, this guy is considered underrated in 15-16 season. As he is only paid 2M by Hornets.
Findings: Had breakthrough performance by successfully defended Lebron @ 13-14 NBA Finals, got Finals MVP + Championship ring. Saw more playing time and got defensive Player of the year @ 14-15 Contract resign @ 15-16, hence the jump in salary. (and overrated-ness lol)
Findings: A nobody and considered a risky move to sign @ 12-13. (due to injury records) The rest are history. Contract will expire at end of 16-17 season, expect a rocket jump.
Actionable insight Assuming their salary won t change much, Lakers can sign those players. Maybe add some overrated players that can mentor / attract fans to games. Lakers have to pay me a data science consultancy fee to get the full result :)
Actionable insight Fire Nets GM / whoever made the signing decision!
Let s have some fun
Let s have some fun
Challenges along the journey Feature engineering didn t help much. Can t find feature to create Not enough data No economic feature
Future ideas Gather economic data, such as social media followers (facebook) and activity for every players, team ticket and jersey sales, and see if the additional data increase the models' score. Is 0.66 the ceiling of Salary prediction if just performance data is used? find out if MJ are really overrated/priced. Is there a way to create feature from the basics statistic data to improve the score?