Crime Type Prediction¶

Motivation:¶

Problem¶

In urban areas, the biggest concern raised by its residents is the higher rate of crime as compared to its suburban counterparts. Crime takes various forms in cities, and knowing what type of crime is taking place can largely help police forces in order to prevent them from happening. But the problem that arises with this is that it is hard to predict crime; police can either keep tracking the activity of all past criminals to predict their future behavior, but this overlooks any unreported criminals and it raises privacy ethical issues with tracking.

Solution¶

One of the biggest cities with a crime problem is Boston. The Boston Police Department releases the crime incidence reports for each year that maps various factors of crimes that took place in the past. The goal of this project is to identify and use a relationship between factors involved in the crime (e.g. time of occurence, location) and the type of crime.

Impact¶

If successful, this work may yield a classifier which predicts the type of crime based on the factors included in the crime. This tool can be very helpful for policing forces to optimize their resources and prevent monetary funds from being used deployment of officers in unnecessary areas. This will also lead to more proactive policing rather than reactive which will eventually lower crime rates in the long run.

One pitfall of this predictor is that it can cement systematic bias and stigmatize communities. Metropolitan cities have higher percentages of minority communities, and by highlighting certain areas as high risk or low risk, it can perpetuate bias against these minority communities.

Dataset¶

Detail¶

We will use a Kaggle Dataset of Boston Crime upto 2022 to observe the following factors of a crime:

  • incidence number

  • offense code

  • offense description
  • district
  • shooting
  • year
  • month
  • day of the week
  • hour
  • street
  • location
INDEX INCIDENT_NUMBER OFFENSE_CODE OFFENSE_DESCRIPTION DISTRICT SHOOTING YEAR MONTH DAY_OF_WEEK HOUR STREET Location
0 225520077 3126 WARRANT ARREST - OUTSIDE OF BOSTON WARRANT D14 0 2022 2 Wednesday 0 WASHINGTON ST (42.34308127134165, -71.14172267328729)
1 222648862 3831 M/V - LEAVING SCENE - PROPERTY DAMAGE B2 0 2022 2 Saturday 18 WASHINGTON ST (42.329748204791635, -71.08454011649543)
2 222201764 724 AUTO THEFT C6 0 2022 1 Sunday 0 W BROADWAY (42.341287504390436, -71.05467932649397)
3 222201559 301 ROBBERY D4 0 2022 3 Saturday 13 ALBANY ST (42.333184490911954, -71.07393881002383)
4 222111641 619 LARCENY ALL OTHERS D14 0 2022 2 Monday 12 WASHINGTON ST (42.34905600030506, -71.15049849975023)

Potential Problems¶

A question that arose while looking at this dataset is that do the offense codes all connect to their respective offense descriptions, or can two crimes have the same offense code and different offense descriptions. If the codes and descriptions don't relate, this can create a complexity in generating crime type as another dataset will probably be needed to cross check codes with the names of the crime. In order to mitigate this problem, we will look at a few offense codes and see if all their descriptions are the same.

Method:¶

We pose our problem as a classification problem: based on the various time and location of each crime, we want to predict the type of crime that will take place. Doing so will help us see if there is correlation between time and location and the type of crime that arises.