Formula One is a sport that has recently been popularized in the United States through the introduction of the Netflix series 'Drive to Survive'. This show follows each team and racer throughout a season. The strategy behind race day might seem like a simple concept: Whoever drives the fastest around the track wins. However, there is much more that contributes to race wins than just driving fast. This includes things such as tire compound, when to take a pit stop, and where you start on the grid. The standard F1 race is broken up across a weekend (Friday to Sunday). Teams will practice on Friday to get a feel for track conditions and the car. Then, on Saturday teams go through what is known as qualifying. Racers compete to complete the fastest lap around the track, and the order in which their laps are ranked from fastest to slowest is how the starting grid is decided for the Sunday race. Generally speaking, it is usually best to start the race in first place, but there are some factors that could influence the best starting position. For instance, tracks such as the Bahrain International Circuit and Spa-Francorchamps are both notorious for long straights which make overtaking easy. At these locations it might be advantageous to start second or third because overtaking might prove to be easier than protecting pole position. In this project, I plan to analyze race outcomes to see what type of correlation they have with starting positions. I hope to answer the following questions through this research: How does starting position impact your ability to win the race? Are there any tracks where starting in a position other than pole has been more effective at collecting race wins? Is there a particular racer or team that has experienced more success while starting at a worse grid position?
import pandas as pd
df_results = pd.read_csv('results.csv')
df_races = pd.read_csv('races.csv')
df_drivers = pd.read_csv('drivers.csv')
df_constructors = pd.read_csv('constructors.csv')
df_circuits = pd.read_csv('circuits.csv')
df_results
dict_results = {'resultid': 'particular result being analyzed',
'raceid': 'value given from df_races that refers to specific race',
'driverid': 'driver whose result is being analyzed', 'constructorid': 'team that driver races for',
'number': 'number of driver being analyzed', 'grid': 'position in starting grid',
'position': 'position where driver finished the race', 'positionText': 'finishing position as text',
'positionOrder': 'final position after possible penalties assigned', 'points': 'points awarded',
'laps': 'num of laps completed during race', 'time': 'total race time',
'milliseconds': 'time in milliseconds','fastestLap': 'which lap did this racer complete the quickest',
'rank': 'ranking of fastestl lap', 'fastestLapTime': 'time of fastest lap',
'fastestLapSpeed': 'average speed in kmph achieved during fastest lap',
'statusid': 'refers to id number associated with certain race circumstances (failure, crashes, etc)'}
df_races
dict_race = {'raceid': "particular race", 'year': 'year of race', 'round': 'which round of the season',
'name': 'title of race', 'date': 'date of race', 'time': 'race start time UTC',
'url': 'website with all race information', 'fp1_date': 'date of first practice session',
'fp1_time': 'UTC start time of first practice session', 'fp2_date': 'date of second practice session',
'fp2_time': 'UTC start time of second practice session', 'fp3_date': 'date of third practice session',
'fp3_time': 'UTC start time of third practice session', 'quali_date': 'date of qualifying session',
'quali_time': 'UTC start time of qualifying session', 'sprint_date': 'date of F1 sprint',
'sprint_time': 'UTC start time of F1 sprint'}
df_drivers
dict_drivers = {'driverid': 'number assigned to specific driver', 'driverRef': 'reference name of driver',
'number': 'number worn by this driver', 'forename': 'first name of the driver',
'surname': 'last name of the driver', 'dob': 'date of birth of driver',
'nationality': 'nationality of driver', 'url': 'link to website with all driver info'}
df_constructors
dict_constructors = {'constructorid': 'specific id given to this constructor',
'constructorRef': 'reference name given to the constructor', 'name': 'name of constructor',
'nationality': 'home nation of constructor', 'url': 'website with all information on constructor'}
df_circuits
dict_circuits = {'circuitid': 'specific id number given to circuit', 'circuitRef': 'reference name given to circuit',
'name': 'actual name of circuit', 'location': 'city of circuit', 'country': 'country of circuit',
'lat': 'latitude of cirucit', 'lng': 'longitude of circuit', 'alt': 'altitude of circuit',
'url': 'website with all given information of circuti'}
We will look at starting position, racer, constructor, and ending position to see how starting position impacts the outcomes of particular drivers and their cars. I will do my best to find frequency of starting positions and the resulting end position.