In urban areas, the biggest concern raised by its residents is the higher rate of crime as compared to its suburban counterparts. Crime takes various forms in cities, and knowing what type of crime is taking place can largely help police forces in order to prevent them from happening. But the problem that arises with this is that it is hard to predict crime; police can either keep tracking the activity of all past criminals to predict their future behavior, but this overlooks any unreported criminals and it raises privacy ethical issues with tracking.
One of the biggest cities with a crime problem is Boston. The Boston Police Department releases the crime incidence reports for each year that maps various factors of crimes that took place in the past. The goal of this project is to identify and use a relationship between factors involved in the crime (e.g. time of occurence, location) and the type of crime.
If successful, this work may yield a classifier which predicts the type of crime based on the factors included in the crime. This tool can be very helpful for policing forces to optimize their resources and prevent monetary funds from being used deployment of officers in unnecessary areas. This will also lead to more proactive policing rather than reactive which will eventually lower crime rates in the long run.
One pitfall of this predictor is that it can cement systematic bias and stigmatize communities. Metropolitan cities have higher percentages of minority communities, and by highlighting certain areas as high risk or low risk, it can perpetuate bias against these minority communities.
We will use a Kaggle Dataset of Boston Crime upto 2022 to observe the following factors of a crime:
incidence number
offense code
INDEX | INCIDENT_NUMBER | OFFENSE_CODE | OFFENSE_DESCRIPTION | DISTRICT | SHOOTING | YEAR | MONTH | DAY_OF_WEEK | HOUR | STREET | Location |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 225520077 | 3126 | WARRANT ARREST - OUTSIDE OF BOSTON WARRANT | D14 | 0 | 2022 | 2 | Wednesday | 0 | WASHINGTON ST | (42.34308127134165, -71.14172267328729) |
1 | 222648862 | 3831 | M/V - LEAVING SCENE - PROPERTY DAMAGE | B2 | 0 | 2022 | 2 | Saturday | 18 | WASHINGTON ST | (42.329748204791635, -71.08454011649543) |
2 | 222201764 | 724 | AUTO THEFT | C6 | 0 | 2022 | 1 | Sunday | 0 | W BROADWAY | (42.341287504390436, -71.05467932649397) |
3 | 222201559 | 301 | ROBBERY | D4 | 0 | 2022 | 3 | Saturday | 13 | ALBANY ST | (42.333184490911954, -71.07393881002383) |
4 | 222111641 | 619 | LARCENY ALL OTHERS | D14 | 0 | 2022 | 2 | Monday | 12 | WASHINGTON ST | (42.34905600030506, -71.15049849975023) |
A question that arose while looking at this dataset is that do the offense codes all connect to their respective offense descriptions, or can two crimes have the same offense code and different offense descriptions. If the codes and descriptions don't relate, this can create a complexity in generating crime type as another dataset will probably be needed to cross check codes with the names of the crime. In order to mitigate this problem, we will look at a few offense codes and see if all their descriptions are the same.
We pose our problem as a classification problem: based on the various time and location of each crime, we want to predict the type of crime that will take place. Doing so will help us see if there is correlation between time and location and the type of crime that arises.