Tracking the metrics responsable for calculating a players positive impact after the NBA all star game have been difficult for basketball analysts over the past few decades due to the leagues unpredicatbility with injuries and other factors, as well as the differeing opinions of nearly every analyst.
Who is going to be the NBA's most valuable player, Defensive Player of the Year, and which five players (2 Guards, 2 Forwards, 1 Center) will make up the NBA's first team All-Pro. The goal of this project is to determine which metrics have been most impactful in determining the leagues MVP and use those to find the current seasons MVP as well as DPOY and All-NBA First Team.
If we are able to create a program that determines the most important metrics when finding the MVP, we will be able to use those selected values to determines this years league MVP as well as the DPOY and All-NBA First Team.
We will use the Kaggle 1982-2022 NBA MVP Player Statistics Dataset to determine the metrics most important to calculating the MVP and then use the Kaggle 2022-2023 Player Stats to find the current MVP leader as well as the Defensive Player of the Year and who is apart of the NBA First Team all-pro
1.Games Played(GP)
2.Minutes per Game(MPG)
3.Usage Percentage (USG%)
4.Turnover Percentage (TO%)
5.Effective Field Goal Percentage (eFG%)
6.True Shooting Percentage (TS%)
7.Points Per Game (PPG)
8.Rebounds Per Game (RPG)
9.Assists per Game (APG)
10.Steals + Blocks per game (SPG + BPG)
11.Offensive Rating (ORtg)
12.Defensive Rating (DRtg)
import pandas as pd
df = pd.read_csv("NBA Stats 202223 All Stats NBA Player Props Tool.csv")
df.head()
RANK | NAME | TEAM | POS | AGE | GP | MPG | USG% | TO% | FTA | ... | APG | SPG | BPG | TPG | P+R | P+A | P+R+A | VI | ORtg | DRtg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Dejounte Murray | Atl | G | 26.4 | 56 | 36.4 | 24.8 | 11.1 | 135 | ... | 6.1 | 1.5 | 0.3 | 2.3 | 26.5 | 27.0 | 32.5 | 2.4 | 9.6 | 114.4 |
1 | 2 | Trae Young | Atl | G | 24.4 | 54 | 35.3 | 33.2 | 17.2 | 478 | ... | 10.2 | 1.1 | 0.1 | 4.1 | 29.9 | 37.2 | 40.1 | 1.6 | 11.7 | 116.6 |
2 | 3 | De'Andre Hunter | Atl | F-G | 25.2 | 51 | 31.7 | 19.8 | 9.1 | 176 | ... | 1.4 | 0.5 | 0.2 | 1.2 | 20.0 | 17.1 | 21.4 | 1.3 | 16.6 | 110.1 |
3 | 4 | John Collins | Atl | F-C | 25.4 | 51 | 31.0 | 17.0 | 10.6 | 128 | ... | 1.2 | 0.7 | 1.2 | 1.2 | 20.2 | 14.4 | 21.5 | 2.8 | 18.2 | 106.7 |
4 | 5 | Bogdan Bogdanovic | Atl | G | 30.5 | 36 | 29.6 | 20.2 | 9.2 | 50 | ... | 3.0 | 0.9 | 0.3 | 1.3 | 18.0 | 17.6 | 21.0 | 2.0 | 40.7 | 114.0 |
5 rows × 29 columns
df_1 = pd.read_csv("NBA_dataset.csv")
df_1.head()
season | player | pos | age | team_id | g | gs | mp_per_g | fg_per_g | fga_per_g | ... | ws | ws_per_48 | obpm | dbpm | bpm | vorp | award_share | mov | mov_adj | win_loss_pct | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1982 | Kareem Abdul-Jabbar | C | 34 | LAL | 76 | 76 | 35.2 | 9.9 | 17.1 | ... | 10.7 | 0.192 | 3.8 | 1.2 | 5.0 | 4.7 | 0.045 | 4.87 | 4.37 | 0.695 |
1 | 1982 | Alvan Adams | C | 27 | PHO | 79 | 75 | 30.3 | 6.4 | 13.0 | ... | 7.2 | 0.144 | 1.4 | 2.2 | 3.6 | 3.4 | 0.000 | 3.45 | 3.05 | 0.561 |
2 | 1982 | Mark Aguirre | SF | 22 | DAL | 51 | 20 | 28.8 | 7.5 | 16.1 | ... | 1.9 | 0.061 | 2.3 | -1.6 | 0.7 | 1.0 | 0.000 | -4.43 | -4.48 | 0.341 |
3 | 1982 | Danny Ainge | SG | 22 | BOS | 53 | 1 | 10.6 | 1.5 | 4.2 | ... | 0.5 | 0.042 | -3.7 | 1.0 | -2.7 | -0.1 | 0.000 | 6.38 | 6.35 | 0.768 |
4 | 1982 | Tiny Archibald | PG | 33 | BOS | 68 | 51 | 31.9 | 4.5 | 9.6 | ... | 5.2 | 0.115 | 1.4 | -1.3 | 0.1 | 1.1 | 0.000 | 6.38 | 6.35 | 0.768 |
5 rows × 55 columns
To solve this problem, we will use the second loaded dataset and find out the stats most highly emphasized when choosing the MVP. This means we will use a KN Classifier with the player who most matches the stats within those of the past MVP's. Some problems we might face could include the stat differences in eras. Scoring now is much easier than in previous years so PPG and APG might be inflated which means we will have to standardize a unit. Also games played is very important so that will have to have more value than a basic stat like minutes per game. In order to find the first team all pros, we will have to find out the values of each guard, forward, and center by creating a ranking system based off their stats and game metrics. Then choose the top 2,2, and 1 for their positions, respectively.