Boston Crime Incidence Reporting¶

Motivation:¶

Problem¶

Crime incident reports are on the rise, and as a resident of Boston, being able to stay safe is a top priority. Occurrences of crime can be quite unpredictable-- leading to worry from residents. Hearing about crime reports is one thing, being able to prevent them as much as possible is another.


Solution¶

The Boston Police Department (BPD) documents initial details surrounding an incident to which BPD officers respond. These details often highlight certain characteristics of a particular crime. The goal of this project is to identify and use the relationship between specific characteristics noted of a crime (e.g. location, date, time) and the probability of that crime occurring.


Impact¶

A sucessful analysis of the given data may help create a classifier which predicts how likely it is for a specific crime to occur. A predictor of this kind will help residents determine whether or not they are at risk for a crime, and what is the probility of an actual crime to occur.

More information on crime rates in Boston, and the impact on residents in the city can be read here: https://www.covesmart.com/blog/boston-crime-rate-is-boston-a-safe-city/

Dataset¶

Details¶

We will use a Crime Incident Report Dataset to observe the following features for each crime:

  • offense code
  • offense description
  • reporting_area
  • year
  • month
  • day of the week
  • hour
  • street
  • location

Data Dictionary: Feature Explanation¶


FEATURE DESCRIPTION
offense code assigned code to type of offense
offense description short description of type of offense
year year when the offense occurred
month month when the offense occurred
day of the week day of the week when the offense occurred
hour hour of day when the offense occurred
street street name of where the offense occurred
location location of where the offense occurred

Reduced Dataset Example -- Only values that will be evaluated are included¶


_id offense_code offense_description reporting_area year month day_of_week hour street location
1 1106 FRAUD- CREDIT CARD/ATM FRAUD 574 2023 1 Sunday 0 WASHINGTON ST (42.30971856767274, -71.10429431787648)
2 2670 HARASSMENT/CRIMINAL HARASSMENT 691 2023 1 Sunday 0 CENTRE ST (42.28709355259107, -71.14822128377165)
3 1109 FRAUD - WIRE 355 2023 1 Sunday 0 GIBSON ST (42.29755532959655, -71.05970910242573)
4 1831 SICK ASSIST 341 2023 1 Sunday 13 TOPLIFF STREET None
5 3301 VERBAL DISPUTE 28 2023 1 Sunday 18 PARIS STREET None

Why this dataset?¶

This dataset provides simple and clear characteristics of a particular offense. Given values such as the location, time, date; we will be able to categorize the offense and classify a new crime, as one of those in the categories.

Potential Problems¶

As seen above, there are some crimes that are missing data (ex. location). Many of these 'None' values are represented but NaN. These data values can cause a disruption of our data, but a replacement of those values can be helpful in minimizing the effect.

In [1]:
# Imported Version of dataset -- Full Dataset 
import pandas as pd

boston_crime = pd.read_csv("boston_crime.csv")
boston_crime.head()
Out[1]:
INCIDENT_NUMBER OFFENSE_CODE OFFENSE_CODE_GROUP OFFENSE_DESCRIPTION DISTRICT REPORTING_AREA SHOOTING OCCURRED_ON_DATE YEAR MONTH DAY_OF_WEEK HOUR UCR_PART STREET Lat Long Location
0 232010316 1106 NaN FRAUD - CREDIT CARD / ATM FRAUD E13 574 0 2023-01-01 00:00:00+00 2023 1 Sunday 0 NaN WASHINGTON ST 42.309719 -71.104294 (42.30971856767274, -71.10429431787648)
1 232014572 2670 NaN HARASSMENT/ CRIMINAL HARASSMENT E5 691 0 2023-01-01 00:00:00+00 2023 1 Sunday 0 NaN CENTRE ST 42.287094 -71.148221 (42.28709355259107, -71.14822128377165)
2 232011980 1109 NaN FRAUD - WIRE C11 355 0 2023-01-01 00:00:00+00 2023 1 Sunday 0 NaN GIBSON ST 42.297555 -71.059709 (42.29755532959655, -71.05970910242573)
3 232000130 1831 NaN SICK ASSIST C11 341 0 2023-01-01 13:15:00+00 2023 1 Sunday 13 NaN TOPLIFF STREET NaN NaN NaN
4 232000204 3301 NaN VERBAL DISPUTE A7 28 0 2023-01-01 18:00:00+00 2023 1 Sunday 18 NaN PARIS STREET NaN NaN NaN

Method:¶

We plan to use machine learning methods to analyze our dataset. In order to predict the frequency of a crime occurring, we intend to use the K-Nearest Neighbors algorithm to group the crimes and their characteristics. To test the accuracy of our predictor, we will implement a confusion matrix. In addition to these methods, we will provide clear non-technical graphs, such as bar charts and scatter plots, so that regular citizens can read and understand our information.

Overall, our goal is to group each crime and identify specific characteristics that are associated with it. By doing so, we can categorize crimes based on these characteristics and analyze how frequently a crime may occur if a new crime shares similar characteristics with a previous crime.