Determinig the winner of the Formula 1 race based on various features such as driver performance, car specifications, race circuits, and more. This can help teams see how their drivers will perform against others based on past historical perforamance of drivers, cars and tracks. The motivation behind this project is to help teams, fans, and stakeholders in the sport make better predictions and inform decision-making.
Formula 1 is the biggest racing sport in the world which can be attributed to the popularity of Drive to Survive. The scope of the data allows for a good prediction algorithm. The goal is to identify the winner of the race based on past predictions and performace.
The impact of this project can be significant. Predicting race results accurately can help teams optimize their strategies, improve their chances of winning, and make informed decisions about car design and setup. Fans can also benefit from accurate predictions, as they can make more informed bets, participate in fantasy leagues, and enjoy a more engaging viewing experience. Additionally, stakeholders in the sport, such as broadcasters, sponsors, and organizers, can use the predictions to enhance the overall experience of the sport and attract more viewership and investment. Overall, this project can contribute to the advancement of the sport and the growth of its fanbase.
We will use a Kaggle Dataset of Formula 1 Race Data:
Here's the link to view the table data and headers: https://ibb.co/rGSFRQp
Our project will track the teams and driver performace and create a repository according to the race and driver to predict who can be the potential winner.
import numpy as np
import pandas as pd
data = pd.read_csv("F1_Seasons_champions.csv")
data
Unnamed: 0 | Grand Prix | Circuit | Date | Winner | Team | Laps | Race Time | |
---|---|---|---|---|---|---|---|---|
0 | 0 | Bahrain | Bahrain International Circuit | 20 March 2022 | Charles Leclerc | Ferrari | 57 | 1:37:33.584 |
1 | 1 | Saudi Arabia | Jeddah Corniche Circuit | 27 March 2022 | Max Verstappen | Red Bull RBPT | 50 | 1:24:19.293 |
2 | 2 | Australia | Albert Park Circuit | 10 April 2022 | Charles Leclerc | Ferrari | 58 | 1:27:46.548 |
3 | 3 | Emilia Romagna | Autodromo Enzo e Dino Ferrari | 24 April 2022 | Max Verstappen | Red Bull RBPT | 63 | 1:32:07.986 |
4 | 4 | Miami | Miami International Autodrome | 8 May 2022 | Max Verstappen | Red Bull RBPT | 57 | 1:34:24.258 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
216 | 216 | South Korea | Korean International Circuit | 14 October 2012 | Sebastian Vettel | Red Bull Renault | 55 | 1:36:28.651 |
217 | 217 | India | Buddh International Circuit | 28 October 2012 | Sebastian Vettel | Red Bull Renault | 60 | 1:31:10.744 |
218 | 218 | Abu Dhabi | Yas Marina Circuit | 4 November 2012 | Kimi Räikkönen | Lotus Renault | 55 | 1:45:58.667 |
219 | 219 | United States | Circuit of The Americas | 18 November 2012 | Lewis Hamilton | McLaren Mercedes | 56 | 1:35:55.269 |
220 | 220 | Brazil | Autódromo José Carlos Pace | 25 November 2012 | Jenson Button | McLaren Mercedes | 71 | 1:45:22.656 |
221 rows × 8 columns
Feature Name | Definition | Data Type | Units of Measurement |
---|---|---|---|
Grand Prix | Name of the race | String | N/A |
Circuit | Name of the track | String | N/A |
Date | Date of the race held | DataTime | MM-DD-YYYY |
Winner | Name of the winning driver | String | N/A |
Teams | Name of the winning team | String | N/A |
Laps | Number of laps taken for winning driver | Integar | Laps |
Race Time | Time taken to finish race | DateTime | Minutes |
The data isn't accuracte as a lot of times winnners don't win because of merit or how they performed. Moreover, there are a lot of new drivers and drivers keep on changing hence, we need to figure out a way to track the performace of new dirvers. Of course, the prediction can never be 100% correct and we are just using basic metrics but we can possibly look at incorporating different aspects of a Formula 1 Race such as average team experience, funds and etc.
We will solve our problems using the KNN Classifier as it will help us make predictions of the driver and using other features of the dataset. We can run a regression analysis to estimate the winner. Euclidean distance can be incorporated as it can help us combine multiple attributes into one figure and help us make a prediction.