Our data is from boston.gov and provides insight into the crime commit in Boston in the year 2022. Crime in Boston is a real-world problem and data science can provide helpful insights. Homicide, domestic/non-domestic aggavated assault, commercial burglary, and auto theft are all up from years 2021 to 2022. This is a problem because crime not only causes physical harm, but the emotional trauma that crime (which include but aren't limited to: loneliness, low self-esteem, and fear). This can impact not only the victims, but anyone who witnesses the crime. Thus, the crime rates in Boston need to be addressed.
https://www.bostonherald.com/2023/01/03/bostons-overall-crime-rate-is-down-1-5-in-2022-but-fatal-shootings-rose-by-8-over-2021/ https://www.ncjrs.gov/ovc_archives/reports/fptp/impactcrm.htm#:~:text=From%20Pain%20To%20Power%3A%20The%20Impact%20of%20Crime&text=Crime%20victims%20often%20suffer%20a,and%20depression%20are%20common%20reactions.
Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.
import pandas as pd
crime_data = pd.read_csv("crime_data.csv")
crime_data
/var/folders/1q/ybj4fwdn10307m301ksg3gq00000gn/T/ipykernel_75372/2178675920.py:2: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False. crime_data = pd.read_csv("crime_data.csv")
INCIDENT_NUMBER | OFFENSE_CODE | OFFENSE_CODE_GROUP | OFFENSE_DESCRIPTION | DISTRICT | REPORTING_AREA | SHOOTING | OCCURRED_ON_DATE | YEAR | MONTH | DAY_OF_WEEK | HOUR | UCR_PART | STREET | Lat | Long | Location | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 222076257 | 619 | NaN | LARCENY ALL OTHERS | D4 | 167 | 0 | 2022-01-01 00:00:00 | 2022 | 1 | Saturday | 0 | NaN | HARRISON AVE | 42.339542 | -71.069409 | (42.33954198983014, -71.06940876967543) |
1 | 222053099 | 2670 | NaN | HARASSMENT/ CRIMINAL HARASSMENT | A7 | 0 | 2022-01-01 00:00:00 | 2022 | 1 | Saturday | 0 | NaN | BENNINGTON ST | 42.377246 | -71.032597 | (42.37724638479816, -71.0325970804128) | |
2 | 222039411 | 3201 | NaN | PROPERTY - LOST/ MISSING | D14 | 778 | 0 | 2022-01-01 00:00:00 | 2022 | 1 | Saturday | 0 | NaN | WASHINGTON ST | 42.349056 | -71.150498 | (42.34905600030506, -71.15049849975023) |
3 | 222011090 | 3201 | NaN | PROPERTY - LOST/ MISSING | B3 | 465 | 0 | 2022-01-01 00:00:00 | 2022 | 1 | Saturday | 0 | NaN | BLUE HILL AVE | 42.284826 | -71.091374 | (42.28482576580488, -71.09137368938802) |
4 | 222062685 | 3201 | NaN | PROPERTY - LOST/ MISSING | B3 | 465 | 0 | 2022-01-01 00:00:00 | 2022 | 1 | Saturday | 0 | NaN | BLUE HILL AVE | 42.284826 | -71.091374 | (42.28482576580488, -71.09137368938802) |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
73847 | 232000091 | 1402 | NaN | VANDALISM | A1 | 66 | 0 | 2022-12-31 23:30:00 | 2022 | 12 | Saturday | 23 | NaN | CHARLES ST | 42.359790 | -71.070782 | (42.35979037458775, -71.07078234449541) |
73848 | 232000002 | 3831 | NaN | M/V - LEAVING SCENE - PROPERTY DAMAGE | C11 | 0 | 2022-12-31 23:37:00 | 2022 | 12 | Saturday | 23 | NaN | COLUMBIA RD | 42.319593 | -71.062607 | (42.31959298334654, -71.06260699634272) | |
73849 | 232000140 | 619 | NaN | LARCENY ALL OTHERS | D14 | 778 | 0 | 2022-12-31 23:45:00 | 2022 | 12 | Saturday | 23 | NaN | WASHINGTON ST | 42.349056 | -71.150498 | (42.34905600030506, -71.15049849975023) |
73850 | 232000315 | 3201 | NaN | PROPERTY - LOST/ MISSING | D4 | 167 | 0 | 2022-12-31 23:50:00 | 2022 | 12 | Saturday | 23 | NaN | HARRISON AVENUE | NaN | NaN | NaN |
73851 | 232000052 | 3114 | NaN | INVESTIGATE PROPERTY | A1 | 0 | 2022-12-31 23:50:00 | 2022 | 12 | Saturday | 23 | NaN | MOUNT VERNON ST | 42.357879 | -71.069680 | (42.357878706878985, -71.06967973039733) |
73852 rows × 17 columns
Data science can provide helpful insights by using machine learning and KNN on characteristics such as time of the crime and location to predict when and where a crime is most likely to occur as well as how severe. To test the accuracy of our predictions we will develop a confusion matrix.