It is widely understood that marginalized groups are disproportionately harmed by the criminal justice system. Biases play a significant part in the level of conviction for a crime, further widening the gap of societal inequality.
By using technology to analyze crime data, underlying prejudices and biases in outcomes for incidents of crime can be identified. Using a dataset that provides information about the case such as the location, type of crime, FBI code, and whether an arrest was made or not, trends in data can be analyzed to understand what the standard outcome of a case should be. If possible, with a dataset that provides the years sentenced, the number of years sentenced for a case can also be predicted. The goal of this project is to use previous crime data to predict and classify the outcome of a crime incident.
With the increasing use of AI in jury cases, it is essential to understand the biases and ethical considerations behind many arrests. Such a classifier could be used to create an unbiased, objective verdict on a criminal case that can be used as a baseline, helping improve the quality and efficiency of the criminal justice system.
A potential setback to this classifier is the data itself potentially has many discrepencies based on prejudice. For example, variables such as race, class, gender, and age are all factors that influence the outcome of a criminal case, and training on a biased dataset may produced biased outcomes.
Link to dataset: https://www.kaggle.com/datasets/chicago/chicago-crime
Relevant columns:
date | primary_type | description | location_description | arrest | community_area | fbi_code | year |
---|---|---|---|---|---|---|---|
07/17/2012 11:30:00 | PUBLIC PEACE VIOLATION | RECKLESS CONDUCT | SIDEWALK | True | 50 | 26 | 2012 |
05/24/2002 11:47:42 | ROBBERY | ARMED: HANDGUN | PARKING LOT/GARAGE(NON.RESID.) | False | 50 | 03 | 2002 |
05/08/2005 09:20:00 | BATTERY | AGGRAVATED: OTHER DANG WEAPON | SIDEWALK | False | 49 | 04B | 2005 |
06/21/2007 11:30:00 | BURGLARY | FORCIBLE ENTRY | CHA APARTMENT | False | 49 | 05 | 2007 |
09/07/2010 09:41:00 | THEFT | OVER $500 | SCHOOL, PUBLIC, GROUNDS | True | 50 | 06 | 2010 |
12/15/2008 10:18:00 | HOMICIDE | FIRST DEGREE MURDER | STREET | False | 49 | 01A | 2008 |
04/21/2018 10:00:00 | CRIM SEXUAL ASSAULT | NON-AGGRAVATED | APARTMENT | False | 50 | 02 | 2018 |
09/05/2018 12:00:00 | CRIMINAL TRESPASS | TO LAND | CONSTRUCTION SITE | True | 49 | 26 | 2018 |
05/10/2007 03:15:00 | NARCOTICS | FORFEIT PROPERTY | STREET | True | 49 | 26 | 2007 |
02/11/2003 01:35:00 | OTHER OFFENSE | VIOLATE ORDER OF PROTECTION | RESIDENCE | False | 49 | 26 | 2003 |
This project seeks to use the features above to estimate the outcome of an inputted criminal case.
As stated before, a large problem is in the bias of the dataset. Current biases in the criminal justice system influence the outcome of arrest, which undermines the credibility of this classifier. Furthermore, because it is important to consider the nuances in criminal cases, this classifier can only be used as a baseline judgement on whether or not a person should be arrested for their crime.
This is a classification problem: the data above can be used to predict whether a person will be arrested for their crime. With a more thorough dataset including sentencing of convicted cases, it can potentially be a regression problem where the amount of years sentenced can be predicted based on information from earlier cases. This approach offers a less-biased and efficient method to determining arrest cases by eliminating confounding variables that typically negatively affect minorities.