Analyzing Crime in Boston¶

The Problem¶

Our data is from boston.gov and provides insight into the crime commit in Boston in the year 2022. Crime in Boston is a real-world problem and data science can provide helpful insights. Homicide, domestic/non-domestic aggavated assault, commercial burglary, and auto theft are all up from years 2021 to 2022. This is a problem because crime not only causes physical harm, but the emotional trauma that crime (which include but aren't limited to: loneliness, low self-esteem, and fear). This can impact not only the victims, but anyone who witnesses the crime. Thus, the crime rates in Boston need to be addressed.

https://www.bostonherald.com/2023/01/03/bostons-overall-crime-rate-is-down-1-5-in-2022-but-fatal-shootings-rose-by-8-over-2021/ https://www.ncjrs.gov/ovc_archives/reports/fptp/impactcrm.htm#:~:text=From%20Pain%20To%20Power%3A%20The%20Impact%20of%20Crime&text=Crime%20victims%20often%20suffer%20a,and%20depression%20are%20common%20reactions.

Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.

The Data¶

In [4]:
import pandas as pd
crime_data = pd.read_csv("crime_data.csv")
crime_data
/var/folders/1q/ybj4fwdn10307m301ksg3gq00000gn/T/ipykernel_75372/2178675920.py:2: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
  crime_data = pd.read_csv("crime_data.csv")
Out[4]:
INCIDENT_NUMBER OFFENSE_CODE OFFENSE_CODE_GROUP OFFENSE_DESCRIPTION DISTRICT REPORTING_AREA SHOOTING OCCURRED_ON_DATE YEAR MONTH DAY_OF_WEEK HOUR UCR_PART STREET Lat Long Location
0 222076257 619 NaN LARCENY ALL OTHERS D4 167 0 2022-01-01 00:00:00 2022 1 Saturday 0 NaN HARRISON AVE 42.339542 -71.069409 (42.33954198983014, -71.06940876967543)
1 222053099 2670 NaN HARASSMENT/ CRIMINAL HARASSMENT A7 0 2022-01-01 00:00:00 2022 1 Saturday 0 NaN BENNINGTON ST 42.377246 -71.032597 (42.37724638479816, -71.0325970804128)
2 222039411 3201 NaN PROPERTY - LOST/ MISSING D14 778 0 2022-01-01 00:00:00 2022 1 Saturday 0 NaN WASHINGTON ST 42.349056 -71.150498 (42.34905600030506, -71.15049849975023)
3 222011090 3201 NaN PROPERTY - LOST/ MISSING B3 465 0 2022-01-01 00:00:00 2022 1 Saturday 0 NaN BLUE HILL AVE 42.284826 -71.091374 (42.28482576580488, -71.09137368938802)
4 222062685 3201 NaN PROPERTY - LOST/ MISSING B3 465 0 2022-01-01 00:00:00 2022 1 Saturday 0 NaN BLUE HILL AVE 42.284826 -71.091374 (42.28482576580488, -71.09137368938802)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
73847 232000091 1402 NaN VANDALISM A1 66 0 2022-12-31 23:30:00 2022 12 Saturday 23 NaN CHARLES ST 42.359790 -71.070782 (42.35979037458775, -71.07078234449541)
73848 232000002 3831 NaN M/V - LEAVING SCENE - PROPERTY DAMAGE C11 0 2022-12-31 23:37:00 2022 12 Saturday 23 NaN COLUMBIA RD 42.319593 -71.062607 (42.31959298334654, -71.06260699634272)
73849 232000140 619 NaN LARCENY ALL OTHERS D14 778 0 2022-12-31 23:45:00 2022 12 Saturday 23 NaN WASHINGTON ST 42.349056 -71.150498 (42.34905600030506, -71.15049849975023)
73850 232000315 3201 NaN PROPERTY - LOST/ MISSING D4 167 0 2022-12-31 23:50:00 2022 12 Saturday 23 NaN HARRISON AVENUE NaN NaN NaN
73851 232000052 3114 NaN INVESTIGATE PROPERTY A1 0 2022-12-31 23:50:00 2022 12 Saturday 23 NaN MOUNT VERNON ST 42.357879 -71.069680 (42.357878706878985, -71.06967973039733)

73852 rows × 17 columns

Data Dictionary¶

  • INCIDENT_NUMBER
    • the number of the incident
  • OFFENSE_CODE
    • offense code
  • OFFENSE_GROUP_CODE
    • the number of the offense group code
  • OFFENSE_DESCRIPTION
    • type of crime committed
  • DISTRICT
    • the districct where the crime was commit
  • REPORTING_AREA
    • number of the reporting area
  • SHOOTING
    • yes or no, whether or not there was a shooting
  • OCCURED_ON_DATE
    • the date the crime occurred (year, month, day, hour)
  • YEAR
    • year the crime occurred
  • MONTH
    • month the crime occurred
  • DAY_OF_WEEK
    • day of the week the crime occurred
  • HOUR
    • the hour the crime occurred
  • STREET
    • the streetname where the crime occurred
  • Lat
    • the latitude where the crime occurred
  • Long
    • the longitude where the crime occurred
  • Location
    • both the lat and long of where the crime occurred

The Solution¶

Data science can provide helpful insights by using machine learning and KNN on characteristics such as time of the crime and location to predict when and where a crime is most likely to occur as well as how severe. To test the accuracy of our predictions we will develop a confusion matrix.