In [1]:
# NCAA quarterback

''' Instructions:

Each individual student will submit a project proposal (3% of final grade) in .ipynb format which:

  1. (1%) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).

  2. (1%) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.

  3. (1%) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do. For example:

“We’ll cluster the movies into sets of movies which are often watched by the same users. Doing so allows us to discover if there is a more natural grouping of movies rather than the traditional genres: horror, comedy, romantic-comedy, etc”.

'''

1¶

'''

  1. I would like to target a problem in sports that I believe could use the valuable insight of data science. Specifically, I would like to target the NFL and its most popular position. The Quarterback. While this problem and concept is far more evident in the NCAA. I do believe it persists in the NFL, albeit a much smaller scale. Every year quarterback stats and rankings list get highlighted by sports media. This form of media is in part, meant to drum up attention or hype for the participating players and teams. What should be reserved for casual conversation, has now made its way into contract negotations, tv screens, and even entire webpages. If youre going to subjectively rank people, and use these rankings as real life ammunication, the least you can do is be accurate with them. That is where I think data science comes into play. I believe both current and all-time quarterback stats should be adjusted when displayed in a number of ways. If you are comparing players from decades apart, you must account for serveral obvious variables like changes in rules, penalties, and playcalling. I believe some of the same principles apply when comparing players from the same season. Specifically playcalling. I think how conservative vs. aggressive a team is coached and playcalled has a much larger affect/impact on the statistics of the teams Quarterback than is realized. I would like to use data science to measuere this.

https://www.nfl.com/news/nfl-qb-index-ranking-all-32-teams-primary-starting-quarterbacks-at-the-end-of-th

https://www.pro-football-reference.com/years/2022/passing.htm

https://www.pro-football-reference.com/years/2022/index.htm#all_team_stats

'''

In [7]:
#2

import pandas as pd

# Load the dataset from the URL
url = 'https://www.pro-football-reference.com/years/2022/passing.htm'
df = pd.read_html(url)[0]

# Display the first five rows of the dataset
print(df.head(25))

# Create a data dictionary that explains the meaning of each feature
data_dict = {
    'Rk': 'Rank of the player based on the selected statistic',
    'Player': 'Name of the player',
    'Tm': 'Name of the team the player belongs to',
    'Age': 'Age of the player',
    'Pos': 'Position of the player',
    'G': 'Number of games played by the player',
    'GS': 'Number of games started by the player',
    'Cmp': 'Total number of pass completions made by the player',
    'Att': 'Total number of pass attempts made by the player',
    'Cmp%': 'Percentage of pass completions made by the player',
    'Yds': 'Total number of passing yards gained by the player',
    'TD': 'Total number of passing touchdowns made by the player',
    'TD%': 'Percentage of passing attempts that resulted in a touchdown made by the player',
    'Int': 'Total number of interceptions thrown by the player',
    'Int%': 'Percentage of passing attempts that resulted in an interception thrown by the player',
    'Lng': 'Longest pass completed by the player in yards',
    'Y/A': 'Average number of yards gained per pass attempt by the player',
    'AY/A': 'Average number of adjusted yards gained per pass attempt by the player, taking into account touchdowns and interceptions',
    'Y/C': 'Average number of yards gained per pass completion by the player',
    'Y/G': 'Average number of passing yards gained per game by the player',
    'Rate': 'Passer rating of the player',
    'QBR': 'Total quarterback rating (QBR) of the player, a more comprehensive measure of quarterback performance',
}

# Print the data dictionary
print(data_dict)




'''
I should probably include team stats as well but I do think this dataset is sufficient to demonstrate the problem.
You have players being compared as equal while some have attemped 100, 200, or even 300 more passing attempts
than their peers. This needs to be measured. Pitchers who can throw more pitches per game are more valuable right?
Not the same thing at all, i know, but still a variable that isn't strongly considered.


'''
    Rk             Player   Tm Age Pos   G  GS   QBrec  Cmp  Att  ...    Y/G  \
0    1  Patrick Mahomes*+  KAN  27  QB  17  17  14-3-0  435  648  ...  308.8   
1    2     Justin Herbert  LAC  24  QB  17  17  10-7-0  477  699  ...  278.8   
2    3          Tom Brady  TAM  45  QB  17  17   8-9-0  490  733  ...  276.1   
3    4      Kirk Cousins*  MIN  34  QB  17  17  13-4-0  424  643  ...  267.5   
4    5        Joe Burrow*  CIN  26  QB  16  16  12-4-0  414  606  ...  279.7   
5    6        Jared Goff*  DET  28  QB  17  17   9-8-0  382  587  ...  261.1   
6    7        Josh Allen*  BUF  26  QB  16  16  13-3-0  359  567  ...  267.7   
7    8        Geno Smith*  SEA  32  QB  17  17   9-8-0  399  572  ...  251.9   
8    9   Trevor Lawrence*  JAX  23  QB  17  17   9-8-0  387  584  ...  241.9   
9   10       Jalen Hurts*  PHI  24  QB  15  15  14-1-0  306  460  ...  246.7   
10  11      Aaron Rodgers  GNB  39  QB  17  17   8-9-0  350  542  ...  217.4   
11  12     Tua Tagovailoa  MIA  24  QB  13  13   8-5-0  259  400  ...  272.9   
12  13     Russell Wilson  DEN  34  QB  15  15  4-11-0  292  483  ...  234.9   
13  14        Derek Carr*  LVR  31  QB  15  15   6-9-0  305  502  ...  234.8   
14  15       Daniel Jones  NYG  25  QB  16  16   9-6-1  317  472  ...  200.3   
15  16        Davis Mills  HOU  24  QB  15  15  3-10-1  292  479  ...  207.9   
16  17          Matt Ryan  IND  37  QB  12  12   4-7-1  309  461  ...  254.8   
17  18          Mac Jones  NWE  24  QB  14  14   6-8-0  288  442  ...  214.1   
18  19        Andy Dalton  NOR  35  QB  14  14   6-8-0  252  378  ...  205.1   
19  20       Dak Prescott  DAL  29  QB  12  12   8-4-0  261  394  ...  238.3   
20  21    Jacoby Brissett  CLE  30  QB  16  11   4-7-0  236  369  ...  163.0   
21  22     Ryan Tannehill  TEN  34  QB  12  12   6-6-0  212  325  ...  211.3   
22  23    Jimmy Garoppolo  SFO  31  QB  11  10   7-3-0  207  308  ...  221.5   
23  24      Kenny Pickett  PIT  24  QB  13  12   7-5-0  245  389  ...  184.9   
24  25       Kyler Murray  ARI  25  QB  11  11   3-8-0  259  390  ...  215.3   

     Rate   QBR  Sk Yds.1   Sk%  NY/A ANY/A  4QC  GWD  
0   105.2  77.6  26   188   3.9  7.51  7.93    4    4  
1    93.2  58.3  38   206   5.2  6.15  6.22    4    5  
2    90.7  52.5  22   160   2.9  6.01  6.13    4    5  
3    92.5  49.9  46   329   6.7  6.12  6.05    8    8  
4   100.8  58.7  41   259   6.3  6.52  6.76    3    4  
5    99.3  61.1  23   156   3.8  7.02  7.45    3    3  
6    96.6  71.4  33   162   5.5  6.87  6.99    3    4  
7   100.9  60.8  46   348   7.4  6.37  6.54    2    3  
8    95.2  54.6  27   184   4.4  6.43  6.66    3    2  
9   101.5  66.3  38   231   7.6  6.97  7.31    1    2  
10   91.1  39.3  32   258   5.6  5.99  5.95    3    4  
11  105.5  68.8  21   163   5.0  8.04  8.37    2    2  
12   84.4  37.0  55   368  10.2  5.87  5.54    3    3  
13   86.3  55.6  27   191   5.1  6.30  6.01    4    3  
14   92.5  60.8  44   243   8.5  5.74  5.89    4    5  
15   78.8  33.2  31   244   6.1  5.64  4.98    2    3  
16   83.9  43.1  38   287   7.6  5.55  4.94    5    4  
17   84.8  36.1  34   231   7.1  5.81  5.36    0    0  
18   95.2  50.7  25   189   6.2  6.66  6.54    1    1  
19   91.1  57.9  20   126   4.8  6.60  6.08    2    2  
20   88.9  59.9  24   160   6.1  6.23  6.15    2    2  
21   94.6  49.2  33   238   9.2  6.42  6.39  NaN  NaN  
22  103.0  54.4  18   100   5.5  7.17  7.60    1    1  
23   76.7  51.6  27   182   6.5  5.34  4.70    3    4  
24   87.2  51.7  25   192   6.0  5.24  5.16    1    1  

[25 rows x 31 columns]
{'Rk': 'Rank of the player based on the selected statistic', 'Player': 'Name of the player', 'Tm': 'Name of the team the player belongs to', 'Age': 'Age of the player', 'Pos': 'Position of the player', 'G': 'Number of games played by the player', 'GS': 'Number of games started by the player', 'Cmp': 'Total number of pass completions made by the player', 'Att': 'Total number of pass attempts made by the player', 'Cmp%': 'Percentage of pass completions made by the player', 'Yds': 'Total number of passing yards gained by the player', 'TD': 'Total number of passing touchdowns made by the player', 'TD%': 'Percentage of passing attempts that resulted in a touchdown made by the player', 'Int': 'Total number of interceptions thrown by the player', 'Int%': 'Percentage of passing attempts that resulted in an interception thrown by the player', 'Lng': 'Longest pass completed by the player in yards', 'Y/A': 'Average number of yards gained per pass attempt by the player', 'AY/A': 'Average number of adjusted yards gained per pass attempt by the player, taking into account touchdowns and interceptions', 'Y/C': 'Average number of yards gained per pass completion by the player', 'Y/G': 'Average number of passing yards gained per game by the player', 'Rate': 'Passer rating of the player', 'QBR': 'Total quarterback rating (QBR) of the player, a more comprehensive measure of quarterback performance'}

3¶

'''

This data will be used to both weight how stats are impacted by coaching decision (run vs. pass) as well as analyze how some of the "Iron Men" (Tom Brady) are able to get away with attemping to many. There is a lot to look at and explore. I haven't yet thought of a way to include clock management yet. Specifically, how some coaches choose to go to half with "1 timeout, 55 sec, and 60 yards to go" and others try to score. That's a huge factor I think I can measure using team statistics.

'''

In [ ]: