crime prediction (boston)¶

In recent years, safety has been a significant concern for many residents in the United States, and crime rates in the city are much higher when compared to the suburbs. In an article written by James H Anderson, the violent-crime rate in urban areas was between 29 percent and 42 percent higher than the rate in rural areas. In a large metropolitan city like Boston, there are many crimes and dangers hiding throughout the city, so finding out which districts have the highest crime rates can really help residents of Boston in avoiding danger.

https://data.boston.gov/dataset/crime-incident-reports-august-2015-to-date-source-new-system/resource/313e56df-6d77-49d2-9c49-ee411f10cf58

In [21]:
import pandas as pd

df_2 = pd.read_csv('tmpdfeo3qy2.csv')

df = pd.read_csv('rmscrimeincidentfieldexplanation.xlsx - Sheet1 (2).csv')

#data too big cant show the whole data set
print(df_2[0:20])

print(df)
   INCIDENT_NUMBER  OFFENSE_CODE  OFFENSE_CODE_GROUP  \
0        222076257           619                 NaN   
1        222053099          2670                 NaN   
2        222039411          3201                 NaN   
3        222011090          3201                 NaN   
4        222062685          3201                 NaN   
5        222040307          3115                 NaN   
6        222023700          2670                 NaN   
7        222027838          3114                 NaN   
8        222002890          1109                 NaN   
9        222000003           423                 NaN   
10       222000387          2647                 NaN   
11       222000182          3201                 NaN   
12       222000309          3201                 NaN   
13       222000181          3201                 NaN   
14       222000004          1402                 NaN   
15       222000027          3114                 NaN   
16       222000011           423                 NaN   
17       222000006           423                 NaN   
18       222000010          3802                 NaN   
19       222000025          3115                 NaN   

                OFFENSE_DESCRIPTION DISTRICT REPORTING_AREA  SHOOTING  \
0                LARCENY ALL OTHERS       D4            167         0   
1   HARASSMENT/ CRIMINAL HARASSMENT       A7                        0   
2          PROPERTY - LOST/ MISSING      D14            778         0   
3          PROPERTY - LOST/ MISSING       B3            465         0   
4          PROPERTY - LOST/ MISSING       B3            465         0   
5                INVESTIGATE PERSON       A1            954         0   
6   HARASSMENT/ CRIMINAL HARASSMENT       D4                        0   
7              INVESTIGATE PROPERTY       B3                        0   
8                      FRAUD - WIRE       C6            200         0   
9              ASSAULT - AGGRAVATED       D4                        0   
10        THREATS TO DO BODILY HARM       A1             77         0   
11         PROPERTY - LOST/ MISSING       A1             77         0   
12         PROPERTY - LOST/ MISSING      A15             41         0   
13         PROPERTY - LOST/ MISSING       A1                        0   
14                        VANDALISM       E5                        0   
15             INVESTIGATE PROPERTY       A1                        0   
16             ASSAULT - AGGRAVATED       A1                        0   
17             ASSAULT - AGGRAVATED       A1             93         0   
18   M/V ACCIDENT - PROPERTY DAMAGE       D4                        0   
19               INVESTIGATE PERSON      E18            519         0   

       OCCURRED_ON_DATE  YEAR  MONTH DAY_OF_WEEK  HOUR  UCR_PART  \
0   2022-01-01 00:00:00  2022      1    Saturday     0       NaN   
1   2022-01-01 00:00:00  2022      1    Saturday     0       NaN   
2   2022-01-01 00:00:00  2022      1    Saturday     0       NaN   
3   2022-01-01 00:00:00  2022      1    Saturday     0       NaN   
4   2022-01-01 00:00:00  2022      1    Saturday     0       NaN   
5   2022-01-01 00:00:00  2022      1    Saturday     0       NaN   
6   2022-01-01 00:00:00  2022      1    Saturday     0       NaN   
7   2022-01-01 00:01:00  2022      1    Saturday     0       NaN   
8   2022-01-01 00:01:00  2022      1    Saturday     0       NaN   
9   2022-01-01 00:29:00  2022      1    Saturday     0       NaN   
10  2022-01-01 00:30:00  2022      1    Saturday     0       NaN   
11  2022-01-01 00:30:00  2022      1    Saturday     0       NaN   
12  2022-01-01 00:30:00  2022      1    Saturday     0       NaN   
13  2022-01-01 00:30:00  2022      1    Saturday     0       NaN   
14  2022-01-01 00:33:00  2022      1    Saturday     0       NaN   
15  2022-01-01 00:44:00  2022      1    Saturday     0       NaN   
16  2022-01-01 00:46:00  2022      1    Saturday     0       NaN   
17  2022-01-01 00:48:00  2022      1    Saturday     0       NaN   
18  2022-01-01 00:53:00  2022      1    Saturday     0       NaN   
19  2022-01-01 00:57:00  2022      1    Saturday     0       NaN   

              STREET        Lat       Long  \
0       HARRISON AVE  42.339542 -71.069409   
1      BENNINGTON ST  42.377246 -71.032597   
2      WASHINGTON ST  42.349056 -71.150498   
3      BLUE HILL AVE  42.284826 -71.091374   
4      BLUE HILL AVE  42.284826 -71.091374   
5          FULTON ST  42.362936 -71.052538   
6       HARRISON AVE  42.339542 -71.069409   
7          SELDEN ST  42.280894 -71.080375   
8         W BROADWAY  42.341288 -71.054679   
9      BROOKLINE AVE  42.346251 -71.099539   
10    NEW SUDBURY ST  42.361839 -71.059765   
11    NEW SUDBURY ST  42.361839 -71.059765   
12           VINE ST  42.376632 -71.055932   
13    NEW SUDBURY ST  42.361839 -71.059765   
14        DELFORD ST  42.292092 -71.121089   
15        MELROSE ST  42.348965 -71.068485   
16         SCHOOL ST  42.357546 -71.058820   
17          UNION ST  42.360688 -71.056873   
18  SAINT BOTOLPH ST  42.346175 -71.079259   
19         PIERCE ST  42.257114 -71.117191   

                                    Location  
0    (42.33954198983014, -71.06940876967543)  
1     (42.37724638479816, -71.0325970804128)  
2    (42.34905600030506, -71.15049849975023)  
3    (42.28482576580488, -71.09137368938802)  
4    (42.28482576580488, -71.09137368938802)  
5     (42.36293610909294, -71.0525379472723)  
6    (42.33954198983014, -71.06940876967543)  
7    (42.280893655822176, -71.0803746810546)  
8   (42.341287504390436, -71.05467932649397)  
9    (42.34625079905638, -71.09953855872904)  
10  (42.361838566564714, -71.05976489094158)  
11  (42.361838566564714, -71.05976489094158)  
12   (42.37663166813234, -71.05593195497872)  
13  (42.361838566564714, -71.05976489094158)  
14  (42.292092069228836, -71.12108876777533)  
15   (42.34896523984558, -71.06848460481795)  
16    (42.35754619923963, -71.0588195319583)  
17   (42.36068786991018, -71.05687293370812)  
18  (42.346174755447535, -71.07925920373032)  
19   (42.25711405538539, -71.11719142973159)  
    Field Name, Data Type, Required  \
0                   [incident_num]    
1                    [offense_code]   
2  [Offense_Code_Group_Description]   
3             [Offense_Description]   
4                        [district]   
5                 [reporting_area]    
6                        [shooting]   
7         [occurred_on] [datetime2]   
8                           [Month]   
9                            [year]   

                                         Description  
0                         Internal BPD report number  
1              Numerical code of offense description  
2   Internal categorization of [offense_description]  
3                     Primary descriptor of incident  
4            What district the crime was reported in  
5  RA number associated with the where the crime ...  
6                   Indicated a shooting took place.  
7  Earliest date and time the incident could have...  
8                    Which the crime was reported in  
9               which year the crime was reported in  
/var/folders/rg/6rp05ksn32sgc45qc8ftq6gm0000gn/T/ipykernel_69007/3560792410.py:3: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
  df_2 = pd.read_csv('tmpdfeo3qy2.csv')

We will be creating graphs indicating what crimes were reported in which area, and we will also be seeing during which seasons have higher crime rates. With this information, we can provide a guide to residents of Boston on which areas to avoid and at what time or seasons they should avoid going out.

Sources: https://www.city-journal.org/violent-crime-in-cities-on-the-rise

https://data.boston.gov/dataset/crime-incident-reports-august-2015-to-date-source-new-system/resource/313e56df-6d77-49d2-9c49-ee411f10cf58