NBA MVP, DPOY, and ALL-NBA First Team Predictions¶

Dhruv Rokkam¶

Problem¶

Tracking the metrics responsable for calculating a players positive impact after the NBA all star game have been difficult for basketball analysts over the past few decades due to the leagues unpredicatbility with injuries and other factors, as well as the differeing opinions of nearly every analyst.

Solution¶

Who is going to be the NBA's most valuable player, Defensive Player of the Year, and which five players (2 Guards, 2 Forwards, 1 Center) will make up the NBA's first team All-Pro. The goal of this project is to determine which metrics have been most impactful in determining the leagues MVP and use those to find the current seasons MVP as well as DPOY and All-NBA First Team.

Impact¶

If we are able to create a program that determines the most important metrics when finding the MVP, we will be able to use those selected values to determines this years league MVP as well as the DPOY and All-NBA First Team.

Dataset¶

Details¶

We will use the Kaggle 1982-2022 NBA MVP Player Statistics Dataset to determine the metrics most important to calculating the MVP and then use the Kaggle 2022-2023 Player Stats to find the current MVP leader as well as the Defensive Player of the Year and who is apart of the NBA First Team all-pro

Variables¶

1.Games Played(GP)
2.Minutes per Game(MPG)
3.Usage Percentage (USG%)
4.Turnover Percentage (TO%)
5.Effective Field Goal Percentage (eFG%)
6.True Shooting Percentage (TS%)
7.Points Per Game (PPG)
8.Rebounds Per Game (RPG)
9.Assists per Game (APG)
10.Steals + Blocks per game (SPG + BPG)
11.Offensive Rating (ORtg)
12.Defensive Rating (DRtg)
In [36]:
import pandas as pd
df = pd.read_csv("NBA Stats 202223 All Stats  NBA Player Props Tool.csv")
df.head()
Out[36]:
RANK NAME TEAM POS AGE GP MPG USG% TO% FTA ... APG SPG BPG TPG P+R P+A P+R+A VI ORtg DRtg
0 1 Dejounte Murray Atl G 26.4 56 36.4 24.8 11.1 135 ... 6.1 1.5 0.3 2.3 26.5 27.0 32.5 2.4 9.6 114.4
1 2 Trae Young Atl G 24.4 54 35.3 33.2 17.2 478 ... 10.2 1.1 0.1 4.1 29.9 37.2 40.1 1.6 11.7 116.6
2 3 De'Andre Hunter Atl F-G 25.2 51 31.7 19.8 9.1 176 ... 1.4 0.5 0.2 1.2 20.0 17.1 21.4 1.3 16.6 110.1
3 4 John Collins Atl F-C 25.4 51 31.0 17.0 10.6 128 ... 1.2 0.7 1.2 1.2 20.2 14.4 21.5 2.8 18.2 106.7
4 5 Bogdan Bogdanovic Atl G 30.5 36 29.6 20.2 9.2 50 ... 3.0 0.9 0.3 1.3 18.0 17.6 21.0 2.0 40.7 114.0

5 rows × 29 columns

In [39]:
df_1 = pd.read_csv("NBA_dataset.csv")
df_1.head()
Out[39]:
season player pos age team_id g gs mp_per_g fg_per_g fga_per_g ... ws ws_per_48 obpm dbpm bpm vorp award_share mov mov_adj win_loss_pct
0 1982 Kareem Abdul-Jabbar C 34 LAL 76 76 35.2 9.9 17.1 ... 10.7 0.192 3.8 1.2 5.0 4.7 0.045 4.87 4.37 0.695
1 1982 Alvan Adams C 27 PHO 79 75 30.3 6.4 13.0 ... 7.2 0.144 1.4 2.2 3.6 3.4 0.000 3.45 3.05 0.561
2 1982 Mark Aguirre SF 22 DAL 51 20 28.8 7.5 16.1 ... 1.9 0.061 2.3 -1.6 0.7 1.0 0.000 -4.43 -4.48 0.341
3 1982 Danny Ainge SG 22 BOS 53 1 10.6 1.5 4.2 ... 0.5 0.042 -3.7 1.0 -2.7 -0.1 0.000 6.38 6.35 0.768
4 1982 Tiny Archibald PG 33 BOS 68 51 31.9 4.5 9.6 ... 5.2 0.115 1.4 -1.3 0.1 1.1 0.000 6.38 6.35 0.768

5 rows × 55 columns

Meathod¶

To solve this problem, we will use the second loaded dataset and find out the stats most highly emphasized when choosing the MVP. This means we will use a KN Classifier with the player who most matches the stats within those of the past MVP's. Some problems we might face could include the stat differences in eras. Scoring now is much easier than in previous years so PPG and APG might be inflated which means we will have to standardize a unit. Also games played is very important so that will have to have more value than a basic stat like minutes per game. In order to find the first team all pros, we will have to find out the values of each guard, forward, and center by creating a ranking system based off their stats and game metrics. Then choose the top 2,2, and 1 for their positions, respectively.