Crime Type Prediction¶

Motivation:¶

Problem¶

In urban areas, the biggest concern raised by its residents is the higher rate of crime as compared to its suburban counterparts. Crime takes various forms in cities, and knowing what type of crime is taking place can largely help police forces in order to prevent them from happening. But the problem that arises with this is that it is hard to predict crime; police can either keep tracking the activity of all past criminals to predict their future behavior, but this overlooks any unreported criminals and it raises privacy ethical issues with tracking.

Solution¶

One of the biggest cities with a crime problem is Boston. The Boston Police Department releases the crime incidence reports for each year that maps various factors of crimes that took place in the past. The goal of this project is to identify and use a relationship between factors involved in the crime (e.g. time of occurence, location) and the type of crime.

Impact¶

If successful, this work may yield a classifier which predicts the type of crime based on the factors included in the crime. This tool can be very helpful for policing forces to optimize their resources and prevent monetary funds from being used deployment of officers in unnecessary areas. This will also lead to more proactive policing rather than reactive which will eventually lower crime rates in the long run.

One pitfall of this predictor is that it can cement systematic bias and stigmatize communities. Metropolitan cities have higher percentages of minority communities, and by highlighting certain areas as high risk or low risk, it can perpetuate bias against these minority communities.

Dataset¶

Detail¶

We will use a Kaggle Dataset of Boston Crime upto 2022 to observe the following factors of a crime:

incidence number
offense code
offense description
district
shooting
year
month
day of the week
hour
street
location

INDEX	INCIDENT_NUMBER	OFFENSE_CODE	OFFENSE_DESCRIPTION	DISTRICT	YEAR	MONTH	DAY_OF_WEEK	HOUR	STREET	Location
0	225520077	3126	WARRANT ARREST - OUTSIDE OF BOSTON WARRANT	D14	2022	2	Wednesday	0	WASHINGTON ST	(42.34308127134165, -71.14172267328729)
1	222648862	3831	M/V - LEAVING SCENE - PROPERTY DAMAGE	B2	2022	2	Saturday	18	WASHINGTON ST	(42.329748204791635, -71.08454011649543)
2	222201764	724	AUTO THEFT	C6	2022	1	Sunday	0	W BROADWAY	(42.341287504390436, -71.05467932649397)
3	222201559	301	ROBBERY	D4	2022	3	Saturday	13	ALBANY ST	(42.333184490911954, -71.07393881002383)
4	222111641	619	LARCENY ALL OTHERS	D14	2022	2	Monday	12	WASHINGTON ST	(42.34905600030506, -71.15049849975023)

Potential Problems¶

A question that arose while looking at this dataset is that do the offense codes all connect to their respective offense descriptions, or can two crimes have the same offense code and different offense descriptions. If the codes and descriptions don't relate, this can create a complexity in generating crime type as another dataset will probably be needed to cross check codes with the names of the crime. In order to mitigate this problem, we will look at a few offense codes and see if all their descriptions are the same.

Method:¶

We pose our problem as a classification problem: based on the various time and location of each crime, we want to predict the type of crime that will take place. Doing so will help us see if there is correlation between time and location and the type of crime that arises.