Predicting NBA Hall of Fame Players¶

Motivation and Real World Problem¶

I'm a huge fan of sports, especially basketball. I would like to look at these datasets to predict what awards and statistics are that deems a professional NBA basketball player "worthy" of being in the NBA. Examining this outside source current hof and using the dataset, we will create a "standard', which is the absolutely minimum a basketball player should achieve through their career to be inducted into the Hall Of Fame. There are certain current players we know that will make it into the hall of fame, e.g. Lebron James, however what can we say about other stars? Will Derrick Rose, a former MVP, make it? I hope to make a prediction model to do so.

Dataset¶

In [15]:
import pandas as pd 

nba_player_df = pd.read_csv('player_data.csv')
nba_stats_df = pd.read_csv('Players.csv')
nba_season_stats_df = pd.read_csv('Seasons_Stats.csv')
In [16]:
nba_stats_df
Out[16]:
Unnamed: 0 Player height weight collage born birth_city birth_state
0 0 Curly Armstrong 180.0 77.0 Indiana University 1918.0 NaN NaN
1 1 Cliff Barker 188.0 83.0 University of Kentucky 1921.0 Yorktown Indiana
2 2 Leo Barnhorst 193.0 86.0 University of Notre Dame 1924.0 NaN NaN
3 3 Ed Bartels 196.0 88.0 North Carolina State University 1925.0 NaN NaN
4 4 Ralph Beard 178.0 79.0 University of Kentucky 1927.0 Hardinsburg Kentucky
... ... ... ... ... ... ... ... ...
3917 3917 Troy Williams 198.0 97.0 South Carolina State University 1969.0 Columbia South Carolina
3918 3918 Kyle Wiltjer 208.0 108.0 Gonzaga University 1992.0 Portland Oregon
3919 3919 Stephen Zimmerman 213.0 108.0 University of Nevada, Las Vegas 1996.0 Hendersonville Tennessee
3920 3920 Paul Zipser 203.0 97.0 NaN 1994.0 Heidelberg Germany
3921 3921 Ivica Zubac 216.0 120.0 NaN 1997.0 Mostar Bosnia and Herzegovina

3922 rows × 8 columns

This Dataframe gives us the city the player was born in and the college the player attended (if applicable, some players not attend college). I would like to see if playing at a certain college correlates with making it into the hall of fame. It would also be interested to see if the state and country impacts players HOF potential.

In [17]:
nba_player_df
Out[17]:
name year_start year_end position height weight birth_date college
0 Alaa Abdelnaby 1991 1995 F-C 6-10 240.0 June 24, 1968 Duke University
1 Zaid Abdul-Aziz 1969 1978 C-F 6-9 235.0 April 7, 1946 Iowa State University
2 Kareem Abdul-Jabbar 1970 1989 C 7-2 225.0 April 16, 1947 University of California, Los Angeles
3 Mahmoud Abdul-Rauf 1991 2001 G 6-1 162.0 March 9, 1969 Louisiana State University
4 Tariq Abdul-Wahad 1998 2003 F 6-6 223.0 November 3, 1974 San Jose State University
... ... ... ... ... ... ... ... ...
4545 Ante Zizic 2018 2018 F-C 6-11 250.0 January 4, 1997 NaN
4546 Jim Zoet 1983 1983 C 7-1 240.0 December 20, 1953 Kent State University
4547 Bill Zopf 1971 1971 G 6-1 170.0 June 7, 1948 Duquesne University
4548 Ivica Zubac 2017 2018 C 7-1 265.0 March 18, 1997 NaN
4549 Matt Zunic 1949 1949 G-F 6-3 195.0 December 19, 1919 George Washington University

4550 rows × 8 columns

This dataframe is similar to the one above, except it just gives us more players. This Dataframe may be more precise to create a prediction model because we can compare the positions, rather than the player as a whole. For example, a hall of fame Center will have much different stats compared to a hall of fame pointguard.

In [18]:
nba_season_stats_df
Out[18]:
Unnamed: 0 Year Player Pos Age Tm G GS MP PER ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
0 0 1950.0 Curly Armstrong G-F 31.0 FTW 63.0 NaN NaN NaN ... 0.705 NaN NaN NaN 176.0 NaN NaN NaN 217.0 458.0
1 1 1950.0 Cliff Barker SG 29.0 INO 49.0 NaN NaN NaN ... 0.708 NaN NaN NaN 109.0 NaN NaN NaN 99.0 279.0
2 2 1950.0 Leo Barnhorst SF 25.0 CHS 67.0 NaN NaN NaN ... 0.698 NaN NaN NaN 140.0 NaN NaN NaN 192.0 438.0
3 3 1950.0 Ed Bartels F 24.0 TOT 15.0 NaN NaN NaN ... 0.559 NaN NaN NaN 20.0 NaN NaN NaN 29.0 63.0
4 4 1950.0 Ed Bartels F 24.0 DNN 13.0 NaN NaN NaN ... 0.548 NaN NaN NaN 20.0 NaN NaN NaN 27.0 59.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
24686 24686 2017.0 Cody Zeller PF 24.0 CHO 62.0 58.0 1725.0 16.7 ... 0.679 135.0 270.0 405.0 99.0 62.0 58.0 65.0 189.0 639.0
24687 24687 2017.0 Tyler Zeller C 27.0 BOS 51.0 5.0 525.0 13.0 ... 0.564 43.0 81.0 124.0 42.0 7.0 21.0 20.0 61.0 178.0
24688 24688 2017.0 Stephen Zimmerman C 20.0 ORL 19.0 0.0 108.0 7.3 ... 0.600 11.0 24.0 35.0 4.0 2.0 5.0 3.0 17.0 23.0
24689 24689 2017.0 Paul Zipser SF 22.0 CHI 44.0 18.0 843.0 6.9 ... 0.775 15.0 110.0 125.0 36.0 15.0 16.0 40.0 78.0 240.0
24690 24690 2017.0 Ivica Zubac C 19.0 LAL 38.0 11.0 609.0 17.0 ... 0.653 41.0 118.0 159.0 30.0 14.0 33.0 30.0 66.0 284.0

24691 rows × 53 columns

This Dataframe gives us the statisitcs of every player that played from 1950 until 2017. It gives us statistics such as points per game, rebounds, blocks, tov, etc. What I hope to do is take the average of the stats of players throughout their careers and make a line graph. Some other stats incude their age, team, and position. The biggest issue with this data is that certain stats were recorded back in the days, such as offensive rebounding, PER, etc. I also hope to use other outside sources that tells us if the player is in the hall of fame.

How the data will be used¶

I will cluster the average of each player's stats (PPG, REB, AST) into sets. I will then create an order from highest to lowest and find the threshold for what it considered a hall of flame player; some other things I may incorporate into the dataset include championships, all-nba appearances, all-star appearances, all-defensive team appearances, etc. I would love to create a graph to visualize the threshold!