Boston Crime Prediction¶

Motivation:¶

Problem¶

One of the major concerns for many cities and communities all around the world is crime. Crimes rates are rising and rising all over the country and it is getting harder and harder to combat different crimes. Police have been trying to find strategies and techniques that could help prevent crime.

Solution¶

Police departments provide crime reports each year. This crime report has all the neccessary information about every crime reported. If analyzed properly, artificial intelligence could predict serious violent crimes. The goal of this project is to create a model that analyzes and identifies the relationship between the neighborhoods, the type of crime and the number of incidents reported.

Impact¶

If successful, this model could predict the crimes that are most likely to occur in certain neighborhoods in Boston. This model could help analyze the crime data and develop models that could help the officers identify the areas where the crimes are more likely to occur. By compiling and analyzing data from multiple sources, predictive methods identify patterns and generate recommendations about where crimes are likely to occur

Dataset¶

Detail¶

We will use Boston Police Department Crime Incident Reports (August 2015 - To Date to observe the following factors of crime rpeorts:

  • Offense Code
  • Offense Description
  • District
  • Reporting Area
  • Shooting
  • Occured on Date
  • Year
  • Month
  • Day of Week
  • Hour
  • UCR Part
  • Street
  • Lat
  • Long
  • Location
In [12]:
import pandas as pd              
reader = pd.read_csv('crime2022.csv')
reader.drop(columns=['OFFENSE_CODE_GROUP','INCIDENT_NUMBER' ], inplace=True)
reader.head()
/var/folders/n_/qrrjhxxx3351chd0b7r6t2bc0000gn/T/ipykernel_1881/138618666.py:2: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
  reader = pd.read_csv('crime2022.csv')
Out[12]:
OFFENSE_CODE OFFENSE_DESCRIPTION DISTRICT REPORTING_AREA SHOOTING OCCURRED_ON_DATE YEAR MONTH DAY_OF_WEEK HOUR UCR_PART STREET Lat Long Location
0 619 LARCENY ALL OTHERS D4 167 0 2022-01-01 00:00:00 2022 1 Saturday 0 NaN HARRISON AVE 42.339542 -71.069409 (42.33954198983014, -71.06940876967543)
1 2670 HARASSMENT/ CRIMINAL HARASSMENT A7 0 2022-01-01 00:00:00 2022 1 Saturday 0 NaN BENNINGTON ST 42.377246 -71.032597 (42.37724638479816, -71.0325970804128)
2 3201 PROPERTY - LOST/ MISSING D14 778 0 2022-01-01 00:00:00 2022 1 Saturday 0 NaN WASHINGTON ST 42.349056 -71.150498 (42.34905600030506, -71.15049849975023)
3 3201 PROPERTY - LOST/ MISSING B3 465 0 2022-01-01 00:00:00 2022 1 Saturday 0 NaN BLUE HILL AVE 42.284826 -71.091374 (42.28482576580488, -71.09137368938802)
4 3201 PROPERTY - LOST/ MISSING B3 465 0 2022-01-01 00:00:00 2022 1 Saturday 0 NaN BLUE HILL AVE 42.284826 -71.091374 (42.28482576580488, -71.09137368938802)

Our project seeks to use the features above to identify the areas where the different crimes are more likely to occur. District, reporting area, the date(year,month,day, hour), the street, latitude, longtitude and location provide detailed data of where and when the crimes occured. Offense description, shooting, ucr part, and offense code provide a detailed data of what crimes occured.

Potential Problems¶

Our assumption is that everything on the crime reports are accurate. However, the police description in reporting are sometimes sources of inaccuracy in constructing statistical crime records. The time the incident reported could be wrong or the individual filling

Method¶

We will categorize the crimes based on offense code and offense description. Doing this allows us to discover where and what crimes occured the most. This way we could see if somewhat the same level/ severity of crimes occur at the same places. This model would give a grouping of the most commited crimes with the location they were commited at. This way police could be on the look out and be prepared for a certain type of crime to occur in the area with high crime incidents.