Formula 1 Race Prediction - Aashu Kedia¶

Motivation:¶

Problem¶

Determinig the winner of the Formula 1 race based on various features such as driver performance, car specifications, race circuits, and more. This can help teams see how their drivers will perform against others based on past historical perforamance of drivers, cars and tracks. The motivation behind this project is to help teams, fans, and stakeholders in the sport make better predictions and inform decision-making.

Solution¶

Formula 1 is the biggest racing sport in the world which can be attributed to the popularity of Drive to Survive. The scope of the data allows for a good prediction algorithm. The goal is to identify the winner of the race based on past predictions and performace.

Impact¶

The impact of this project can be significant. Predicting race results accurately can help teams optimize their strategies, improve their chances of winning, and make informed decisions about car design and setup. Fans can also benefit from accurate predictions, as they can make more informed bets, participate in fantasy leagues, and enjoy a more engaging viewing experience. Additionally, stakeholders in the sport, such as broadcasters, sponsors, and organizers, can use the predictions to enhance the overall experience of the sport and attract more viewership and investment. Overall, this project can contribute to the advancement of the sport and the growth of its fanbase.

Dataset¶

Detail¶

We will use a Kaggle Dataset of Formula 1 Race Data:

  • Grand Prix
  • Circuit
  • Date
  • Winner
  • Team
  • Laps
  • Race Time

Here's the link to view the table data and headers: https://ibb.co/rGSFRQp

Our project will track the teams and driver performace and create a repository according to the race and driver to predict who can be the potential winner.

In [1]:
import numpy as np
import pandas as pd
In [4]:
data = pd.read_csv("F1_Seasons_champions.csv")
data
Out[4]:
Unnamed: 0 Grand Prix Circuit Date Winner Team Laps Race Time
0 0 Bahrain Bahrain International Circuit 20 March 2022 Charles Leclerc Ferrari 57 1:37:33.584
1 1 Saudi Arabia Jeddah Corniche Circuit 27 March 2022 Max Verstappen Red Bull RBPT 50 1:24:19.293
2 2 Australia Albert Park Circuit 10 April 2022 Charles Leclerc Ferrari 58 1:27:46.548
3 3 Emilia Romagna Autodromo Enzo e Dino Ferrari 24 April 2022 Max Verstappen Red Bull RBPT 63 1:32:07.986
4 4 Miami Miami International Autodrome 8 May 2022 Max Verstappen Red Bull RBPT 57 1:34:24.258
... ... ... ... ... ... ... ... ...
216 216 South Korea Korean International Circuit 14 October 2012 Sebastian Vettel Red Bull Renault 55 1:36:28.651
217 217 India Buddh International Circuit 28 October 2012 Sebastian Vettel Red Bull Renault 60 1:31:10.744
218 218 Abu Dhabi Yas Marina Circuit 4 November 2012 Kimi Räikkönen Lotus Renault 55 1:45:58.667
219 219 United States Circuit of The Americas 18 November 2012 Lewis Hamilton McLaren Mercedes 56 1:35:55.269
220 220 Brazil Autódromo José Carlos Pace 25 November 2012 Jenson Button McLaren Mercedes 71 1:45:22.656

221 rows × 8 columns

Data Dictionary¶

Feature Name Definition Data Type Units of Measurement
Grand Prix Name of the race String N/A
Circuit Name of the track String N/A
Date Date of the race held DataTime MM-DD-YYYY
Winner Name of the winning driver String N/A
Teams Name of the winning team String N/A
Laps Number of laps taken for winning driver Integar Laps
Race Time Time taken to finish race DateTime Minutes

Potential Problems¶

The data isn't accuracte as a lot of times winnners don't win because of merit or how they performed. Moreover, there are a lot of new drivers and drivers keep on changing hence, we need to figure out a way to track the performace of new dirvers. Of course, the prediction can never be 100% correct and we are just using basic metrics but we can possibly look at incorporating different aspects of a Formula 1 Race such as average team experience, funds and etc.

Method:¶

We will solve our problems using the KNN Classifier as it will help us make predictions of the driver and using other features of the dataset. We can run a regression analysis to estimate the winner. Euclidean distance can be incorporated as it can help us combine multiple attributes into one figure and help us make a prediction.