Vehicle crashes account for almost 40 thousand deaths in the United States every year. In order to prevent further deaths, it would be useful to understand trends in factors such as car types, crash reasons, and location to determine whether certain elements of these factors have higher correlations to car accidents.
Utilizing data collected by the City of New York, we can analyze the trends in aforementioned factors, as the data is constantly updated and widely available. As the U.S' premier superpopulated city, it unfortunately serves as a brewing ground for accidents. However, higher sample size is always useful. The goal of this project is to analyze multiple factors simultaneously and draw conclusions as to which factors result in the most car crashes.
Like stated before, car crashes a ton of deaths per year. In fact, it is the world's 8th leading cause of death. Having more insightful knowledge of when/where/why crashes happen and which factors contribute more to said crashes can be hugely beneficial in crash reduction and save hundreds, maybe even thousands of lives.
Motor Vehicle Collisions - Crashes(New York)
Using this Dataset, we can identify several factors that can help us in our analysis:
import pandas as pd
crash_df = pd.read_csv("Motor_Vehicle_Collisions_-_Crashes.csv", low_memory=False)
crash_df.head()
CRASH DATE | CRASH TIME | BOROUGH | ZIP CODE | LATITUDE | LONGITUDE | LOCATION | ON STREET NAME | CROSS STREET NAME | OFF STREET NAME | ... | CONTRIBUTING FACTOR VEHICLE 2 | CONTRIBUTING FACTOR VEHICLE 3 | CONTRIBUTING FACTOR VEHICLE 4 | CONTRIBUTING FACTOR VEHICLE 5 | COLLISION_ID | VEHICLE TYPE CODE 1 | VEHICLE TYPE CODE 2 | VEHICLE TYPE CODE 3 | VEHICLE TYPE CODE 4 | VEHICLE TYPE CODE 5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 09/11/2021 | 2:39 | NaN | NaN | NaN | NaN | NaN | WHITESTONE EXPRESSWAY | 20 AVENUE | NaN | ... | Unspecified | NaN | NaN | NaN | 4455765 | Sedan | Sedan | NaN | NaN | NaN |
1 | 03/26/2022 | 11:45 | NaN | NaN | NaN | NaN | NaN | QUEENSBORO BRIDGE UPPER | NaN | NaN | ... | NaN | NaN | NaN | NaN | 4513547 | Sedan | NaN | NaN | NaN | NaN |
2 | 06/29/2022 | 6:55 | NaN | NaN | NaN | NaN | NaN | THROGS NECK BRIDGE | NaN | NaN | ... | Unspecified | NaN | NaN | NaN | 4541903 | Sedan | Pick-up Truck | NaN | NaN | NaN |
3 | 09/11/2021 | 9:35 | BROOKLYN | 11208 | 40.667202 | -73.866500 | (40.667202, -73.8665) | NaN | NaN | 1211 LORING AVENUE | ... | NaN | NaN | NaN | NaN | 4456314 | Sedan | NaN | NaN | NaN | NaN |
4 | 12/14/2021 | 8:13 | BROOKLYN | 11233 | 40.683304 | -73.917274 | (40.683304, -73.917274) | SARATOGA AVENUE | DECATUR STREET | NaN | ... | NaN | NaN | NaN | NaN | 4486609 | NaN | NaN | NaN | NaN | NaN |
5 rows × 29 columns
By using covariances and correlation statistics and plotting them, using a linear regression model on multiple factors, or graphing bar charts to demonstrate trends, we can get a better understanding and draw conclusions of the factors most contributing to car crashes in New York, and possibly, with a bigger dataset/by extension, the United States as a whole.
Obvious limitations of this methodology and information is that: