Crashes? Where and Why?¶

Motivation:¶

Problem¶

Vehicle crashes account for almost 40 thousand deaths in the United States every year. In order to prevent further deaths, it would be useful to understand trends in factors such as car types, crash reasons, and location to determine whether certain elements of these factors have higher correlations to car accidents.

Solution¶

Utilizing data collected by the City of New York, we can analyze the trends in aforementioned factors, as the data is constantly updated and widely available. As the U.S' premier superpopulated city, it unfortunately serves as a brewing ground for accidents. However, higher sample size is always useful. The goal of this project is to analyze multiple factors simultaneously and draw conclusions as to which factors result in the most car crashes.

Impact¶

Like stated before, car crashes a ton of deaths per year. In fact, it is the world's 8th leading cause of death. Having more insightful knowledge of when/where/why crashes happen and which factors contribute more to said crashes can be hugely beneficial in crash reduction and save hundreds, maybe even thousands of lives.

CDC Data on Motor Injuries

IIHS Info on Fatalities

Crash Dataset¶

Motor Vehicle Collisions - Crashes(New York)

Using this Dataset, we can identify several factors that can help us in our analysis:

  • Crash Time
  • Borough
  • Longitude
  • Latitude
  • Street Names
  • Vehicle Types
In [7]:
import pandas as pd
crash_df = pd.read_csv("Motor_Vehicle_Collisions_-_Crashes.csv", low_memory=False)
crash_df.head()
Out[7]:
CRASH DATE CRASH TIME BOROUGH ZIP CODE LATITUDE LONGITUDE LOCATION ON STREET NAME CROSS STREET NAME OFF STREET NAME ... CONTRIBUTING FACTOR VEHICLE 2 CONTRIBUTING FACTOR VEHICLE 3 CONTRIBUTING FACTOR VEHICLE 4 CONTRIBUTING FACTOR VEHICLE 5 COLLISION_ID VEHICLE TYPE CODE 1 VEHICLE TYPE CODE 2 VEHICLE TYPE CODE 3 VEHICLE TYPE CODE 4 VEHICLE TYPE CODE 5
0 09/11/2021 2:39 NaN NaN NaN NaN NaN WHITESTONE EXPRESSWAY 20 AVENUE NaN ... Unspecified NaN NaN NaN 4455765 Sedan Sedan NaN NaN NaN
1 03/26/2022 11:45 NaN NaN NaN NaN NaN QUEENSBORO BRIDGE UPPER NaN NaN ... NaN NaN NaN NaN 4513547 Sedan NaN NaN NaN NaN
2 06/29/2022 6:55 NaN NaN NaN NaN NaN THROGS NECK BRIDGE NaN NaN ... Unspecified NaN NaN NaN 4541903 Sedan Pick-up Truck NaN NaN NaN
3 09/11/2021 9:35 BROOKLYN 11208 40.667202 -73.866500 (40.667202, -73.8665) NaN NaN 1211 LORING AVENUE ... NaN NaN NaN NaN 4456314 Sedan NaN NaN NaN NaN
4 12/14/2021 8:13 BROOKLYN 11233 40.683304 -73.917274 (40.683304, -73.917274) SARATOGA AVENUE DECATUR STREET NaN ... NaN NaN NaN NaN 4486609 NaN NaN NaN NaN NaN

5 rows × 29 columns

Takeaway and Methods¶

By using covariances and correlation statistics and plotting them, using a linear regression model on multiple factors, or graphing bar charts to demonstrate trends, we can get a better understanding and draw conclusions of the factors most contributing to car crashes in New York, and possibly, with a bigger dataset/by extension, the United States as a whole.

Obvious limitations of this methodology and information is that:

  • There are definitely holes in the data. Not all information is given, and due to that the results will not be as accurate as desired.
  • New York is not a super accurate representation of the US as a whole. As an urban city, it only reflects that specific lifestyle and therefore can't be used as a gauge for all living environments.
In [ ]: