DS 2500 Project Proposal - Page Lootsma
It is notoriously difficult to compare drivers from different teams and different eras in Formula 1, as each team (consisting of only two drivers) has a different car that changes on a race-to-race basis. Thus, a driver can only be reasonably compared to their teammate, as this is the only individual in the same machinery and the same set of circumstances. This fascilitates the need for a data-driven method of quantifiably comparing drivers from different teams and different eras.
Typically, there are two major skills which are considered when assessing a driver's ability--their racecraft and their pace. Racecraft refers to a drivers' ability to maneuver and overtake other drivers over the course of an entire race. Quantifying a drivers' racecraft in a manner that be compared to other drivers is extremely difficult to facilitate, as every driver experiences a unique set of circumstances throughout a race that cannot be repeated or emulated.
A more effective means of quantifiably comparing drivers' is their pace. Pace is often regarded as the antithesis of racecraft; rather than observing a drivers' ability to 'work their way through the pack', pace refers to a drivers' raw speed over the course of a single lap, uninhibited by other drivers and pushing their machinery to the absolute limit.
Unlike racecraft, pace has is much more feasibly quantified. In order to determine their position at the start of the race, Formula 1 drivers participate in a qualifying session. The rules for qualifying are complicated, but essentially the drivers are attempting to complete the fastest single lap they can, as the order of the race will be determined by their qualifying time. This means that drivers are going flat-out under essentially the same track conditions without taking strategic elements such as tyre wear or traffic into consideration. As a result, drivers' qualifying times provide an invaluable window into the pure pace a driver possesses which can be directly compared to their teammate.
A link to the Kaggle dataset can be found here.
Import Pandas and read in the csv file.
import pandas as pd
df_quali = pd.read_csv('qualifying.csv')
df_quali.tail()
Car | Detail | Driver | DriverCode | Grand Prix | Laps | No | Pos | Q1 | Q2 | Q3 | Time | Year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
17231 | Haas Ferrari | Qualifying | Kevin Magnussen | MAG | Abu Dhabi | 9.0 | 20 | 16 | 1:25.834 | NaN | NaN | NaN | 2022 |
17232 | AlphaTauri RBPT | Qualifying | Pierre Gasly | GAS | Abu Dhabi | 9.0 | 10 | 17 | 1:25.859 | NaN | NaN | NaN | 2022 |
17233 | Alfa Romeo Ferrari | Qualifying | Valtteri Bottas | BOT | Abu Dhabi | 6.0 | 77 | 18 | 1:25.892 | NaN | NaN | NaN | 2022 |
17234 | Williams Mercedes | Qualifying | Alexander Albon | ALB | Abu Dhabi | 9.0 | 23 | 19 | 1:26.028 | NaN | NaN | NaN | 2022 |
17235 | Williams Mercedes | Qualifying | Nicholas Latifi | LAT | Abu Dhabi | 9.0 | 6 | 20 | 1:26.054 | NaN | NaN | NaN | 2022 |
Create a dictionary of all drivers, with each key being a specific driver's name containing a dataframe of that driver's qualifying results.
driver_list = df_quali['Driver'].unique()
all_drivers = {}
for driver in driver_list:
bool_driver = df_quali['Driver'] == driver
df_driver = df_quali.loc[bool_driver, :]
all_drivers[driver] = df_driver
Showcase how different drivers can be accessed through the dictionary.
all_drivers["Max Verstappen"]
Car | Detail | Driver | DriverCode | Grand Prix | Laps | No | Pos | Q1 | Q2 | Q3 | Time | Year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13625 | STR Renault | Qualifying | Max Verstappen | VER | Australia | 15.0 | 33 | 12 | 1:29.248 | 1:28.868 | NaN | NaN | 2015 |
13637 | STR Renault | Qualifying | Max Verstappen | VER | Malaysia | 16.0 | 33 | 6 | 1:40.793 | 1:41.430 | 1:51.981 | NaN | 2015 |
13663 | STR Renault | Qualifying | Max Verstappen | VER | China | 14.0 | 33 | 13 | 1:38.387 | 1:38.393 | NaN | NaN | 2015 |
13685 | STR Renault | Qualifying | Max Verstappen | VER | Bahrain | 14.0 | 33 | 15 | 1:35.611 | 1:35.103 | NaN | NaN | 2015 |
13696 | STR Renault | Qualifying | Max Verstappen | VER | Spain | 20.0 | 33 | 6 | 1:27.393 | 1:26.441 | 1:26.249 | NaN | 2015 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
17136 | Red Bull Racing RBPT | Qualifying | Max Verstappen | VER | Japan | 13.0 | 1 | 1 | 1:30.224 | 1:30.346 | 1:29.304 | NaN | 2022 |
17158 | Red Bull Racing RBPT | Qualifying | Max Verstappen | VER | United States | 15.0 | 1 | 3 | 1:35.864 | 1:35.294 | 1:34.448 | NaN | 2022 |
17176 | Red Bull Racing RBPT | Qualifying | Max Verstappen | VER | Mexico | 16.0 | 1 | 1 | 1:19.222 | 1:18.566 | 1:17.775 | NaN | 2022 |
17197 | Red Bull Racing RBPT | Qualifying | Max Verstappen | VER | Brazil | 23.0 | 1 | 2 | 1:13.625 | 1:10.881 | 1:11.877 | NaN | 2022 |
17216 | Red Bull Racing RBPT | Qualifying | Max Verstappen | VER | Abu Dhabi | 17.0 | 1 | 1 | 1:24.754 | 1:24.622 | 1:23.824 | NaN | 2022 |
179 rows × 13 columns
all_drivers["Ayrton Senna"]
Car | Detail | Driver | DriverCode | Grand Prix | Laps | No | Pos | Q1 | Q2 | Q3 | Time | Year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
807 | Toleman Hart | Qualifying | Ayrton Senna | SEN | Brazil | NaN | 19 | 16 | NaN | NaN | NaN | 1:33.525 | 1984 |
828 | Toleman Hart | Qualifying | Ayrton Senna | SEN | South Africa | NaN | 19 | 13 | NaN | NaN | NaN | 1:06.981 | 1984 |
859 | Toleman Hart | Qualifying | Ayrton Senna | SEN | Belgium | NaN | 19 | 19 | NaN | NaN | NaN | 1:18.876 | 1984 |
891 | Toleman Hart | Qualifying | Ayrton Senna | SEN | San Marino | NaN | 19 | 26 | NaN | NaN | NaN | 1:41.585 | 1984 |
904 | Toleman Hart | Qualifying | Ayrton Senna | SEN | France | NaN | 19 | 13 | NaN | NaN | NaN | 1:05.744 | 1984 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5448 | McLaren Ford | Qualifying | Ayrton Senna | SEN | Japan | NaN | 8 | 2 | NaN | NaN | NaN | 1:37.284 | 1993 |
5471 | McLaren Ford | Qualifying | Ayrton Senna | SEN | Australia | NaN | 8 | 1 | NaN | NaN | NaN | 1:13.371 | 1993 |
5495 | Williams Renault | Qualifying | Ayrton Senna | SEN | Brazil | 22.0 | 2 | 1 | NaN | NaN | NaN | 1:15.962 | 1994 |
5522 | Williams Renault | Qualifying | Ayrton Senna | SEN | Pacific | 15.0 | 2 | 1 | NaN | NaN | NaN | 1:10.218 | 1994 |
5550 | Williams Renault | Qualifying | Ayrton Senna | SEN | San Marino | 10.0 | 2 | 1 | NaN | NaN | NaN | 1:21.548 | 1994 |
162 rows × 13 columns
From here, the thought process can be further extrapolated to link drivers and specific races
Essentially, this creates a “chain” through which we can establish a comparison between almost every driver on the grid. For example, if a user was seeking to compare 2021 and 2022 Formula 1 champion Max Verstappen with 1988, 1990, and 1991 late champion Ayrton Senna, the program would run through Formula 1 drivers to establish a the most robust connection between the two via intermediary teammate comparisons.
Additionally, because drivers perform better on circuit types of circuits, the program would as the user for a specific circuit on which to compare the drivers--say the famous Spa Francorchamps in Belgium. The program would then establish a connection between the two inputted drivers using their current and former teammates before calculating an estimated time difference between the two drivers on the specific track in question.
For Formula 1 fans, this project serves as a way to predict how different drivers would compare if they were operating the same machinery. Additionally (and more practically), this could also provide Formula 1 teams with an invaluable resource towards determining which drivers' are faster and therefore which they should sign come the Formula 1 "silly season".