Formula 1 Race Prediction - Aashu Kedia¶

Motivation:¶

Problem¶

Determinig the winner of the Formula 1 race based on various features such as driver performance, car specifications, race circuits, and more. This can help teams see how their drivers will perform against others based on past historical perforamance of drivers, cars and tracks. The motivation behind this project is to help teams, fans, and stakeholders in the sport make better predictions and inform decision-making.

Solution¶

Formula 1 is the biggest racing sport in the world which can be attributed to the popularity of Drive to Survive. The scope of the data allows for a good prediction algorithm. The goal is to identify the winner of the race based on past predictions and performace.

Impact¶

The impact of this project can be significant. Predicting race results accurately can help teams optimize their strategies, improve their chances of winning, and make informed decisions about car design and setup. Fans can also benefit from accurate predictions, as they can make more informed bets, participate in fantasy leagues, and enjoy a more engaging viewing experience. Additionally, stakeholders in the sport, such as broadcasters, sponsors, and organizers, can use the predictions to enhance the overall experience of the sport and attract more viewership and investment. Overall, this project can contribute to the advancement of the sport and the growth of its fanbase.

Dataset¶

Detail¶

We will use a Kaggle Dataset of Formula 1 Race Data:

Grand Prix
Circuit
Date
Winner
Team
Laps
Race Time

Here's the link to view the table data and headers: https://ibb.co/rGSFRQp

Our project will track the teams and driver performace and create a repository according to the race and driver to predict who can be the potential winner.

In [1]:

import numpy as np
import pandas as pd

In [4]:

data = pd.read_csv("F1_Seasons_champions.csv")
data

Out[4]:

	Unnamed: 0	Grand Prix	Circuit	Date	Winner	Team	Laps	Race Time
0	0	Bahrain	Bahrain International Circuit	20 March 2022	Charles Leclerc	Ferrari	57	1:37:33.584
1	1	Saudi Arabia	Jeddah Corniche Circuit	27 March 2022	Max Verstappen	Red Bull RBPT	50	1:24:19.293
2	2	Australia	Albert Park Circuit	10 April 2022	Charles Leclerc	Ferrari	58	1:27:46.548
3	3	Emilia Romagna	Autodromo Enzo e Dino Ferrari	24 April 2022	Max Verstappen	Red Bull RBPT	63	1:32:07.986
4	4	Miami	Miami International Autodrome	8 May 2022	Max Verstappen	Red Bull RBPT	57	1:34:24.258
...	...	...	...	...	...	...	...	...
216	216	South Korea	Korean International Circuit	14 October 2012	Sebastian Vettel	Red Bull Renault	55	1:36:28.651
217	217	India	Buddh International Circuit	28 October 2012	Sebastian Vettel	Red Bull Renault	60	1:31:10.744
218	218	Abu Dhabi	Yas Marina Circuit	4 November 2012	Kimi Räikkönen	Lotus Renault	55	1:45:58.667
219	219	United States	Circuit of The Americas	18 November 2012	Lewis Hamilton	McLaren Mercedes	56	1:35:55.269
220	220	Brazil	Autódromo José Carlos Pace	25 November 2012	Jenson Button	McLaren Mercedes	71	1:45:22.656

221 rows × 8 columns

Data Dictionary¶

Feature Name	Definition	Data Type	Units of Measurement
Grand Prix	Name of the race	String	N/A
Circuit	Name of the track	String	N/A
Date	Date of the race held	DataTime	MM-DD-YYYY
Winner	Name of the winning driver	String	N/A
Teams	Name of the winning team	String	N/A
Laps	Number of laps taken for winning driver	Integar	Laps
Race Time	Time taken to finish race	DateTime	Minutes

Potential Problems¶

The data isn't accuracte as a lot of times winnners don't win because of merit or how they performed. Moreover, there are a lot of new drivers and drivers keep on changing hence, we need to figure out a way to track the performace of new dirvers. Of course, the prediction can never be 100% correct and we are just using basic metrics but we can possibly look at incorporating different aspects of a Formula 1 Race such as average team experience, funds and etc.

Method:¶

We will solve our problems using the KNN Classifier as it will help us make predictions of the driver and using other features of the dataset. We can run a regression analysis to estimate the winner. Euclidean distance can be incorporated as it can help us combine multiple attributes into one figure and help us make a prediction.