I'm a huge fan of sports, especially basketball. I would like to look at these datasets to predict what awards and statistics are that deems a professional NBA basketball player "worthy" of being in the NBA. Examining this outside source current hof and using the dataset, we will create a "standard', which is the absolutely minimum a basketball player should achieve through their career to be inducted into the Hall Of Fame. There are certain current players we know that will make it into the hall of fame, e.g. Lebron James, however what can we say about other stars? Will Derrick Rose, a former MVP, make it? I hope to make a prediction model to do so.
import pandas as pd
nba_player_df = pd.read_csv('player_data.csv')
nba_stats_df = pd.read_csv('Players.csv')
nba_season_stats_df = pd.read_csv('Seasons_Stats.csv')
nba_stats_df
Unnamed: 0 | Player | height | weight | collage | born | birth_city | birth_state | |
---|---|---|---|---|---|---|---|---|
0 | 0 | Curly Armstrong | 180.0 | 77.0 | Indiana University | 1918.0 | NaN | NaN |
1 | 1 | Cliff Barker | 188.0 | 83.0 | University of Kentucky | 1921.0 | Yorktown | Indiana |
2 | 2 | Leo Barnhorst | 193.0 | 86.0 | University of Notre Dame | 1924.0 | NaN | NaN |
3 | 3 | Ed Bartels | 196.0 | 88.0 | North Carolina State University | 1925.0 | NaN | NaN |
4 | 4 | Ralph Beard | 178.0 | 79.0 | University of Kentucky | 1927.0 | Hardinsburg | Kentucky |
... | ... | ... | ... | ... | ... | ... | ... | ... |
3917 | 3917 | Troy Williams | 198.0 | 97.0 | South Carolina State University | 1969.0 | Columbia | South Carolina |
3918 | 3918 | Kyle Wiltjer | 208.0 | 108.0 | Gonzaga University | 1992.0 | Portland | Oregon |
3919 | 3919 | Stephen Zimmerman | 213.0 | 108.0 | University of Nevada, Las Vegas | 1996.0 | Hendersonville | Tennessee |
3920 | 3920 | Paul Zipser | 203.0 | 97.0 | NaN | 1994.0 | Heidelberg | Germany |
3921 | 3921 | Ivica Zubac | 216.0 | 120.0 | NaN | 1997.0 | Mostar | Bosnia and Herzegovina |
3922 rows × 8 columns
This Dataframe gives us the city the player was born in and the college the player attended (if applicable, some players not attend college). I would like to see if playing at a certain college correlates with making it into the hall of fame. It would also be interested to see if the state and country impacts players HOF potential.
nba_player_df
name | year_start | year_end | position | height | weight | birth_date | college | |
---|---|---|---|---|---|---|---|---|
0 | Alaa Abdelnaby | 1991 | 1995 | F-C | 6-10 | 240.0 | June 24, 1968 | Duke University |
1 | Zaid Abdul-Aziz | 1969 | 1978 | C-F | 6-9 | 235.0 | April 7, 1946 | Iowa State University |
2 | Kareem Abdul-Jabbar | 1970 | 1989 | C | 7-2 | 225.0 | April 16, 1947 | University of California, Los Angeles |
3 | Mahmoud Abdul-Rauf | 1991 | 2001 | G | 6-1 | 162.0 | March 9, 1969 | Louisiana State University |
4 | Tariq Abdul-Wahad | 1998 | 2003 | F | 6-6 | 223.0 | November 3, 1974 | San Jose State University |
... | ... | ... | ... | ... | ... | ... | ... | ... |
4545 | Ante Zizic | 2018 | 2018 | F-C | 6-11 | 250.0 | January 4, 1997 | NaN |
4546 | Jim Zoet | 1983 | 1983 | C | 7-1 | 240.0 | December 20, 1953 | Kent State University |
4547 | Bill Zopf | 1971 | 1971 | G | 6-1 | 170.0 | June 7, 1948 | Duquesne University |
4548 | Ivica Zubac | 2017 | 2018 | C | 7-1 | 265.0 | March 18, 1997 | NaN |
4549 | Matt Zunic | 1949 | 1949 | G-F | 6-3 | 195.0 | December 19, 1919 | George Washington University |
4550 rows × 8 columns
This dataframe is similar to the one above, except it just gives us more players. This Dataframe may be more precise to create a prediction model because we can compare the positions, rather than the player as a whole. For example, a hall of fame Center will have much different stats compared to a hall of fame pointguard.
nba_season_stats_df
Unnamed: 0 | Year | Player | Pos | Age | Tm | G | GS | MP | PER | ... | FT% | ORB | DRB | TRB | AST | STL | BLK | TOV | PF | PTS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1950.0 | Curly Armstrong | G-F | 31.0 | FTW | 63.0 | NaN | NaN | NaN | ... | 0.705 | NaN | NaN | NaN | 176.0 | NaN | NaN | NaN | 217.0 | 458.0 |
1 | 1 | 1950.0 | Cliff Barker | SG | 29.0 | INO | 49.0 | NaN | NaN | NaN | ... | 0.708 | NaN | NaN | NaN | 109.0 | NaN | NaN | NaN | 99.0 | 279.0 |
2 | 2 | 1950.0 | Leo Barnhorst | SF | 25.0 | CHS | 67.0 | NaN | NaN | NaN | ... | 0.698 | NaN | NaN | NaN | 140.0 | NaN | NaN | NaN | 192.0 | 438.0 |
3 | 3 | 1950.0 | Ed Bartels | F | 24.0 | TOT | 15.0 | NaN | NaN | NaN | ... | 0.559 | NaN | NaN | NaN | 20.0 | NaN | NaN | NaN | 29.0 | 63.0 |
4 | 4 | 1950.0 | Ed Bartels | F | 24.0 | DNN | 13.0 | NaN | NaN | NaN | ... | 0.548 | NaN | NaN | NaN | 20.0 | NaN | NaN | NaN | 27.0 | 59.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
24686 | 24686 | 2017.0 | Cody Zeller | PF | 24.0 | CHO | 62.0 | 58.0 | 1725.0 | 16.7 | ... | 0.679 | 135.0 | 270.0 | 405.0 | 99.0 | 62.0 | 58.0 | 65.0 | 189.0 | 639.0 |
24687 | 24687 | 2017.0 | Tyler Zeller | C | 27.0 | BOS | 51.0 | 5.0 | 525.0 | 13.0 | ... | 0.564 | 43.0 | 81.0 | 124.0 | 42.0 | 7.0 | 21.0 | 20.0 | 61.0 | 178.0 |
24688 | 24688 | 2017.0 | Stephen Zimmerman | C | 20.0 | ORL | 19.0 | 0.0 | 108.0 | 7.3 | ... | 0.600 | 11.0 | 24.0 | 35.0 | 4.0 | 2.0 | 5.0 | 3.0 | 17.0 | 23.0 |
24689 | 24689 | 2017.0 | Paul Zipser | SF | 22.0 | CHI | 44.0 | 18.0 | 843.0 | 6.9 | ... | 0.775 | 15.0 | 110.0 | 125.0 | 36.0 | 15.0 | 16.0 | 40.0 | 78.0 | 240.0 |
24690 | 24690 | 2017.0 | Ivica Zubac | C | 19.0 | LAL | 38.0 | 11.0 | 609.0 | 17.0 | ... | 0.653 | 41.0 | 118.0 | 159.0 | 30.0 | 14.0 | 33.0 | 30.0 | 66.0 | 284.0 |
24691 rows × 53 columns
This Dataframe gives us the statisitcs of every player that played from 1950 until 2017. It gives us statistics such as points per game, rebounds, blocks, tov, etc. What I hope to do is take the average of the stats of players throughout their careers and make a line graph. Some other stats incude their age, team, and position. The biggest issue with this data is that certain stats were recorded back in the days, such as offensive rebounding, PER, etc. I also hope to use other outside sources that tells us if the player is in the hall of fame.
I will cluster the average of each player's stats (PPG, REB, AST) into sets. I will then create an order from highest to lowest and find the threshold for what it considered a hall of flame player; some other things I may incorporate into the dataset include championships, all-nba appearances, all-star appearances, all-defensive team appearances, etc. I would love to create a graph to visualize the threshold!