Predicting NBA awards¶

Motivation:¶

Problem¶

Determining awards for NBA players can be controversial. With many great talents in the league, it can be hard for voters choose. The voters consist of media (sportswriters and broadcasters) from the US and Canada. Everyone is not going to have the same opinion, and therefore there will be disagreement.

Solution¶

Using a system based on stats can eliminate bias from voters and give the player the award who most deserves it. The goal of this project is to identify and use a relationship between NBA stats of this season's players(e.g. raptor statistics, wins above replacement) and the award winner's stats from previous year.

Impact¶

If we are able to find a relationship, we would be able to predict award winners based on their RAPTOR stats. In addition, if this prediction would end up correct, this give a lot of credit to FiveThirtyEight's RAPTOR statistic.

People that would interested by this data and information would be sports betters. The sports betting industry is a huge multibillion-dollar industry, and if you are putting money on the line, you wanna have a good chance at winning. Those who bet on NBA awards could benefit from this.

Dataset¶

Detail¶

I will use a FiveThirtyEight dataset of NBA player stats to observe the following features for each player:

Column Description
player_name Player name
player_id Basketball-Reference.com player ID
season Season
season_type Regular season (RS) or playoff (PO)
team Basketball-Reference ID of team
poss Possessions played
mp Minutes played
raptor_box_offense Points above average per 100 possessions added by player on offense, based only on box score estimate
raptor_box_defense Points above average per 100 possessions added by player on defense, based only on box score estimate
raptor_box_total Points above average per 100 possessions added by player, based only on box score estimate
raptor_onoff_offense Points above average per 100 possessions added by player on offense, based only on plus-minus data
raptor_onoff_defense Points above average per 100 possessions added by player on defense, based only on plus-minus data
raptor_onoff_total Points above average per 100 possessions added by player, based only on plus-minus data
raptor_offense Points above average per 100 possessions added by player on offense, using both box and on-off components
raptor_defense Points above average per 100 possessions added by player on defense, using both box and on-off components
raptor_total Points above average per 100 possessions added by player on both offense and defense, using both box and on-off components
war_total Wins Above Replacement between regular season and playoffs
war_reg_season Wins Above Replacement for regular season
war_playoffs Wins Above Replacement for playoffs
predator_offense Predictive points above average per 100 possessions added by player on offense
predator_defense Predictive points above average per 100 possessions added by player on defense
predator_total Predictive points above average per 100 possessions added by player on both offense and defense
pace_impact Player impact on team possessions per 48 minutes

RAPTOR is FiveThirtyEight's new NBA statisitc. It stands for 'Robust Algorithim (using) Player Tracking (and) On/off Ratings.' This statistic takes advantage of modern NBA player tracking and play-by-play data found on here. RAPTOR is a plus/minus stat that measures the contribution of a player's offense and defense per 100 possesions relative to the average NBA player.

In [6]:
import pandas as pd
# gets modern data since 2014
df_nba = pd.read_csv('modern_RAPTOR_by_team.csv')
df_nba.head()
Out[6]:
player_name player_id season season_type team poss mp raptor_box_offense raptor_box_defense raptor_box_total ... raptor_offense raptor_defense raptor_total war_total war_reg_season war_playoffs predator_offense predator_defense predator_total pace_impact
0 Alex Abrines abrinal01 2017 PO OKC 172 80 0.420828 -2.862454 -2.441626 ... -0.892617 -6.561258 -7.453875 -0.198700 0.000000 -0.198700 -3.298178 -6.535113 -9.833292 0.334678
1 Alex Abrines abrinal01 2017 RS OKC 2215 1055 0.770717 -0.179621 0.591096 ... 0.654933 -0.724233 -0.069300 1.447708 1.447708 0.000000 0.339201 -0.611866 -0.272665 0.325771
2 Alex Abrines abrinal01 2018 PO OKC 233 110 1.123761 -1.807486 -0.683725 ... 1.875157 0.740292 2.615450 0.311392 0.000000 0.311392 2.877519 -0.520954 2.356566 0.260479
3 Alex Abrines abrinal01 2018 RS OKC 2313 1134 0.236335 -1.717049 -1.480714 ... -0.211818 -1.728584 -1.940401 0.465912 0.465912 0.000000 -0.482078 -1.172227 -1.654306 -0.528330
4 Alex Abrines abrinal01 2019 RS OKC 1279 588 -3.215683 1.078399 -2.137285 ... -4.040157 1.885618 -2.154538 0.178167 0.178167 0.000000 -4.577678 1.543282 -3.034396 -0.268013

5 rows × 23 columns

In [5]:
# 2022-2023 season
df_nba2023 = pd.read_csv('latest_RAPTOR_by_team.csv')
df_nba2023.head()
Out[5]:
player_name player_id season season_type team poss mp raptor_box_offense raptor_box_defense raptor_box_total ... raptor_offense raptor_defense raptor_total war_total war_reg_season war_playoffs predator_offense predator_defense predator_total pace_impact
0 Precious Achiuwa achiupr01 2023 RS TOR 1784 868 -1.925091 0.742752 -1.182340 ... -1.263367 -0.003950 -1.267317 0.650937 0.650937 0 -1.726975 0.248174 -1.478801 -0.872994
1 Steven Adams adamsst01 2023 RS MEM 2391 1133 -0.783499 3.910489 3.126990 ... 0.382174 3.062029 3.444203 3.586879 3.586879 0 -0.004884 3.554965 3.550081 0.177138
2 Bam Adebayo adebaba01 2023 RS MIA 3988 1965 -1.600716 3.206394 1.605678 ... -0.600442 3.188671 2.588229 5.327369 5.327369 0 -0.387635 3.187949 2.800314 -0.461195
3 Ochai Agbaji agbajoc01 2023 RS UTA 1308 610 -1.102455 -1.333894 -2.436349 ... -0.934363 -1.079027 -2.013390 0.228766 0.228766 0 -1.207554 -2.200173 -3.407727 -0.212811
4 Santi Aldama aldamsa01 2023 RS MEM 2658 1232 -0.923960 0.684572 -0.239389 ... -0.939673 0.214642 -0.725031 1.278124 1.278124 0 -0.731312 1.248535 0.517223 0.420301

5 rows × 23 columns

Potential Problems¶

Since this RAPTOR statistic is relatively new, I am not sure how much it directly relates to predicting awards. Also, determining awards is not just based off numbers, there are other factors too. That includes the eye test, the media, and popularity.

Also, in order to predict awards, it is much harder to predict in the beginning of the season when little games are played. There are 82 games in a season per team, so the further into the season we are, the more accurate the prediction would likely be.

Method:¶

The method I believe we would be using is clustering. Given the raptor stats and other features above, we are trying to find the stats of a player who has the most similar of one from a previous award winner, to predict this year's winner. To find award winners from previous years, we will have to do research on that. I'm not sure if this will be the best method because stats of award winners vary from year to year.