# NCAA quarterback
''' Instructions:
Each individual student will submit a project proposal (3% of final grade) in .ipynb format which:
(1%) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).
(1%) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.
(1%) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do. For example:
“We’ll cluster the movies into sets of movies which are often watched by the same users. Doing so allows us to discover if there is a more natural grouping of movies rather than the traditional genres: horror, comedy, romantic-comedy, etc”.
'''
'''
I would like to target a problem in sports that I believe could use the valuable insight of data science. Specifically, I would like to target the NFL and its most popular position. The Quarterback. While this problem and concept is far more evident in the NCAA. I do believe it persists in the NFL, albeit a much smaller scale. Every year quarterback stats and rankings list get highlighted by sports media. This form of media is in part, meant to drum up attention or hype for the participating players and teams. What should be reserved for casual conversation, has now made its way into contract negotations, tv screens, and even entire webpages. If youre going to subjectively rank people, and use these rankings as real life ammunication, the least you can do is be accurate with them. That is where I think data science comes into play. I believe both current and all-time quarterback stats should be adjusted when displayed in a number of ways. If you are comparing players from decades apart, you must account for serveral obvious variables like changes in rules, penalties, and playcalling. I believe some of the same principles apply when comparing players from the same season. Specifically playcalling. I think how conservative vs. aggressive a team is coached and playcalled has a much larger affect/impact on the statistics of the teams Quarterback than is realized. I would like to use data science to measuere this.
https://www.pro-football-reference.com/years/2022/passing.htm
https://www.pro-football-reference.com/years/2022/index.htm#all_team_stats
'''
#2
import pandas as pd
# Load the dataset from the URL
url = 'https://www.pro-football-reference.com/years/2022/passing.htm'
df = pd.read_html(url)[0]
# Display the first five rows of the dataset
print(df.head(25))
# Create a data dictionary that explains the meaning of each feature
data_dict = {
'Rk': 'Rank of the player based on the selected statistic',
'Player': 'Name of the player',
'Tm': 'Name of the team the player belongs to',
'Age': 'Age of the player',
'Pos': 'Position of the player',
'G': 'Number of games played by the player',
'GS': 'Number of games started by the player',
'Cmp': 'Total number of pass completions made by the player',
'Att': 'Total number of pass attempts made by the player',
'Cmp%': 'Percentage of pass completions made by the player',
'Yds': 'Total number of passing yards gained by the player',
'TD': 'Total number of passing touchdowns made by the player',
'TD%': 'Percentage of passing attempts that resulted in a touchdown made by the player',
'Int': 'Total number of interceptions thrown by the player',
'Int%': 'Percentage of passing attempts that resulted in an interception thrown by the player',
'Lng': 'Longest pass completed by the player in yards',
'Y/A': 'Average number of yards gained per pass attempt by the player',
'AY/A': 'Average number of adjusted yards gained per pass attempt by the player, taking into account touchdowns and interceptions',
'Y/C': 'Average number of yards gained per pass completion by the player',
'Y/G': 'Average number of passing yards gained per game by the player',
'Rate': 'Passer rating of the player',
'QBR': 'Total quarterback rating (QBR) of the player, a more comprehensive measure of quarterback performance',
}
# Print the data dictionary
print(data_dict)
'''
I should probably include team stats as well but I do think this dataset is sufficient to demonstrate the problem.
You have players being compared as equal while some have attemped 100, 200, or even 300 more passing attempts
than their peers. This needs to be measured. Pitchers who can throw more pitches per game are more valuable right?
Not the same thing at all, i know, but still a variable that isn't strongly considered.
'''
Rk Player Tm Age Pos G GS QBrec Cmp Att ... Y/G \ 0 1 Patrick Mahomes*+ KAN 27 QB 17 17 14-3-0 435 648 ... 308.8 1 2 Justin Herbert LAC 24 QB 17 17 10-7-0 477 699 ... 278.8 2 3 Tom Brady TAM 45 QB 17 17 8-9-0 490 733 ... 276.1 3 4 Kirk Cousins* MIN 34 QB 17 17 13-4-0 424 643 ... 267.5 4 5 Joe Burrow* CIN 26 QB 16 16 12-4-0 414 606 ... 279.7 5 6 Jared Goff* DET 28 QB 17 17 9-8-0 382 587 ... 261.1 6 7 Josh Allen* BUF 26 QB 16 16 13-3-0 359 567 ... 267.7 7 8 Geno Smith* SEA 32 QB 17 17 9-8-0 399 572 ... 251.9 8 9 Trevor Lawrence* JAX 23 QB 17 17 9-8-0 387 584 ... 241.9 9 10 Jalen Hurts* PHI 24 QB 15 15 14-1-0 306 460 ... 246.7 10 11 Aaron Rodgers GNB 39 QB 17 17 8-9-0 350 542 ... 217.4 11 12 Tua Tagovailoa MIA 24 QB 13 13 8-5-0 259 400 ... 272.9 12 13 Russell Wilson DEN 34 QB 15 15 4-11-0 292 483 ... 234.9 13 14 Derek Carr* LVR 31 QB 15 15 6-9-0 305 502 ... 234.8 14 15 Daniel Jones NYG 25 QB 16 16 9-6-1 317 472 ... 200.3 15 16 Davis Mills HOU 24 QB 15 15 3-10-1 292 479 ... 207.9 16 17 Matt Ryan IND 37 QB 12 12 4-7-1 309 461 ... 254.8 17 18 Mac Jones NWE 24 QB 14 14 6-8-0 288 442 ... 214.1 18 19 Andy Dalton NOR 35 QB 14 14 6-8-0 252 378 ... 205.1 19 20 Dak Prescott DAL 29 QB 12 12 8-4-0 261 394 ... 238.3 20 21 Jacoby Brissett CLE 30 QB 16 11 4-7-0 236 369 ... 163.0 21 22 Ryan Tannehill TEN 34 QB 12 12 6-6-0 212 325 ... 211.3 22 23 Jimmy Garoppolo SFO 31 QB 11 10 7-3-0 207 308 ... 221.5 23 24 Kenny Pickett PIT 24 QB 13 12 7-5-0 245 389 ... 184.9 24 25 Kyler Murray ARI 25 QB 11 11 3-8-0 259 390 ... 215.3 Rate QBR Sk Yds.1 Sk% NY/A ANY/A 4QC GWD 0 105.2 77.6 26 188 3.9 7.51 7.93 4 4 1 93.2 58.3 38 206 5.2 6.15 6.22 4 5 2 90.7 52.5 22 160 2.9 6.01 6.13 4 5 3 92.5 49.9 46 329 6.7 6.12 6.05 8 8 4 100.8 58.7 41 259 6.3 6.52 6.76 3 4 5 99.3 61.1 23 156 3.8 7.02 7.45 3 3 6 96.6 71.4 33 162 5.5 6.87 6.99 3 4 7 100.9 60.8 46 348 7.4 6.37 6.54 2 3 8 95.2 54.6 27 184 4.4 6.43 6.66 3 2 9 101.5 66.3 38 231 7.6 6.97 7.31 1 2 10 91.1 39.3 32 258 5.6 5.99 5.95 3 4 11 105.5 68.8 21 163 5.0 8.04 8.37 2 2 12 84.4 37.0 55 368 10.2 5.87 5.54 3 3 13 86.3 55.6 27 191 5.1 6.30 6.01 4 3 14 92.5 60.8 44 243 8.5 5.74 5.89 4 5 15 78.8 33.2 31 244 6.1 5.64 4.98 2 3 16 83.9 43.1 38 287 7.6 5.55 4.94 5 4 17 84.8 36.1 34 231 7.1 5.81 5.36 0 0 18 95.2 50.7 25 189 6.2 6.66 6.54 1 1 19 91.1 57.9 20 126 4.8 6.60 6.08 2 2 20 88.9 59.9 24 160 6.1 6.23 6.15 2 2 21 94.6 49.2 33 238 9.2 6.42 6.39 NaN NaN 22 103.0 54.4 18 100 5.5 7.17 7.60 1 1 23 76.7 51.6 27 182 6.5 5.34 4.70 3 4 24 87.2 51.7 25 192 6.0 5.24 5.16 1 1 [25 rows x 31 columns] {'Rk': 'Rank of the player based on the selected statistic', 'Player': 'Name of the player', 'Tm': 'Name of the team the player belongs to', 'Age': 'Age of the player', 'Pos': 'Position of the player', 'G': 'Number of games played by the player', 'GS': 'Number of games started by the player', 'Cmp': 'Total number of pass completions made by the player', 'Att': 'Total number of pass attempts made by the player', 'Cmp%': 'Percentage of pass completions made by the player', 'Yds': 'Total number of passing yards gained by the player', 'TD': 'Total number of passing touchdowns made by the player', 'TD%': 'Percentage of passing attempts that resulted in a touchdown made by the player', 'Int': 'Total number of interceptions thrown by the player', 'Int%': 'Percentage of passing attempts that resulted in an interception thrown by the player', 'Lng': 'Longest pass completed by the player in yards', 'Y/A': 'Average number of yards gained per pass attempt by the player', 'AY/A': 'Average number of adjusted yards gained per pass attempt by the player, taking into account touchdowns and interceptions', 'Y/C': 'Average number of yards gained per pass completion by the player', 'Y/G': 'Average number of passing yards gained per game by the player', 'Rate': 'Passer rating of the player', 'QBR': 'Total quarterback rating (QBR) of the player, a more comprehensive measure of quarterback performance'}
'''
This data will be used to both weight how stats are impacted by coaching decision (run vs. pass) as well as analyze how some of the "Iron Men" (Tom Brady) are able to get away with attemping to many. There is a lot to look at and explore. I haven't yet thought of a way to include clock management yet. Specifically, how some coaches choose to go to half with "1 timeout, 55 sec, and 60 yards to go" and others try to score. That's a huge factor I think I can measure using team statistics.
'''