Problem
¶Crime incident reports are on the rise, and as a resident of Boston, being able to stay safe is a top priority. Occurrences of crime can be quite unpredictable-- leading to worry from residents. Hearing about crime reports is one thing, being able to prevent them as much as possible is another.
Solution
¶The Boston Police Department (BPD) documents initial details surrounding an incident to which BPD officers respond. These details often highlight certain characteristics of a particular crime. The goal of this project is to identify and use the relationship between specific characteristics noted of a crime (e.g. location, date, time) and the probability of that crime occurring.
Impact
¶A sucessful analysis of the given data may help create a classifier which predicts how likely it is for a specific crime to occur. A predictor of this kind will help residents determine whether or not they are at risk for a crime, and what is the probility of an actual crime to occur.
More information on crime rates in Boston, and the impact on residents in the city can be read here: https://www.covesmart.com/blog/boston-crime-rate-is-boston-a-safe-city/
Details
¶We will use a Crime Incident Report Dataset to observe the following features for each crime:
FEATURE | DESCRIPTION |
---|---|
offense code | assigned code to type of offense |
offense description | short description of type of offense |
year | year when the offense occurred |
month | month when the offense occurred |
day of the week | day of the week when the offense occurred |
hour | hour of day when the offense occurred |
street | street name of where the offense occurred |
location | location of where the offense occurred |
_id | offense_code | offense_description | reporting_area | year | month | day_of_week | hour | street | location |
---|---|---|---|---|---|---|---|---|---|
1 | 1106 | FRAUD- CREDIT CARD/ATM FRAUD | 574 | 2023 | 1 | Sunday | 0 | WASHINGTON ST | (42.30971856767274, -71.10429431787648) |
2 | 2670 | HARASSMENT/CRIMINAL HARASSMENT | 691 | 2023 | 1 | Sunday | 0 | CENTRE ST | (42.28709355259107, -71.14822128377165) |
3 | 1109 | FRAUD - WIRE | 355 | 2023 | 1 | Sunday | 0 | GIBSON ST | (42.29755532959655, -71.05970910242573) |
4 | 1831 | SICK ASSIST | 341 | 2023 | 1 | Sunday | 13 | TOPLIFF STREET | None |
5 | 3301 | VERBAL DISPUTE | 28 | 2023 | 1 | Sunday | 18 | PARIS STREET | None |
This dataset provides simple and clear characteristics of a particular offense. Given values such as the location, time, date; we will be able to categorize the offense and classify a new crime, as one of those in the categories.
As seen above, there are some crimes that are missing data (ex. location). Many of these 'None' values are represented but NaN
. These data values can cause a disruption of our data, but a replacement of those values can be helpful in minimizing the effect.
# Imported Version of dataset -- Full Dataset
import pandas as pd
boston_crime = pd.read_csv("boston_crime.csv")
boston_crime.head()
INCIDENT_NUMBER | OFFENSE_CODE | OFFENSE_CODE_GROUP | OFFENSE_DESCRIPTION | DISTRICT | REPORTING_AREA | SHOOTING | OCCURRED_ON_DATE | YEAR | MONTH | DAY_OF_WEEK | HOUR | UCR_PART | STREET | Lat | Long | Location | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 232010316 | 1106 | NaN | FRAUD - CREDIT CARD / ATM FRAUD | E13 | 574 | 0 | 2023-01-01 00:00:00+00 | 2023 | 1 | Sunday | 0 | NaN | WASHINGTON ST | 42.309719 | -71.104294 | (42.30971856767274, -71.10429431787648) |
1 | 232014572 | 2670 | NaN | HARASSMENT/ CRIMINAL HARASSMENT | E5 | 691 | 0 | 2023-01-01 00:00:00+00 | 2023 | 1 | Sunday | 0 | NaN | CENTRE ST | 42.287094 | -71.148221 | (42.28709355259107, -71.14822128377165) |
2 | 232011980 | 1109 | NaN | FRAUD - WIRE | C11 | 355 | 0 | 2023-01-01 00:00:00+00 | 2023 | 1 | Sunday | 0 | NaN | GIBSON ST | 42.297555 | -71.059709 | (42.29755532959655, -71.05970910242573) |
3 | 232000130 | 1831 | NaN | SICK ASSIST | C11 | 341 | 0 | 2023-01-01 13:15:00+00 | 2023 | 1 | Sunday | 13 | NaN | TOPLIFF STREET | NaN | NaN | NaN |
4 | 232000204 | 3301 | NaN | VERBAL DISPUTE | A7 | 28 | 0 | 2023-01-01 18:00:00+00 | 2023 | 1 | Sunday | 18 | NaN | PARIS STREET | NaN | NaN | NaN |
We plan to use machine learning methods to analyze our dataset. In order to predict the frequency of a crime occurring, we intend to use the K-Nearest Neighbors
algorithm to group the crimes and their characteristics. To test the accuracy of our predictor, we will implement a confusion matrix
. In addition to these methods, we will provide clear non-technical graphs, such as bar charts
and scatter plots
, so that regular citizens can read and understand our information.
Overall, our goal is to group each crime and identify specific characteristics that are associated with it. By doing so, we can categorize crimes based on these characteristics and analyze how frequently a crime may occur if a new crime shares similar characteristics with a previous crime.