Criminal Sentence Prediction¶

I. Motivation¶

Problem¶

A large issue in criminal proceedings stems from possible biases in court sentencing procedures. There have been many allegations of instances where defendants of certain races, ethnicities, genders, etc., have faced longer sentences than others, despite committing the same crime in a similar manner.

Solution¶

Data provided by the National Archive of Criminal Justice Data (NACJD) offers insight on federal criminal sentences for over 75,000 people. The dataset contains information on the defendant's age, citizenship, race, ethnicity, gender, length of prison sentence, type of crime committed, number of counts of conviction, and criminal history points. The goal of this project is to combat biases in court sentencing procedures by developing a prototypical regression model to predict a defendant's prison sentence based on their criminal history, the crime committed, and similar relevant information.

Impact¶

If successful, the model will predict a defendant's prison sentence for federal crimes with a fair level of accuracy. As court proceedings are notoriously complex, this model may not fully encompass all necessary factors needed to predict a defendant's prison sentence, but it may serve as a baseline for future work in developing models and artificial intelligence systems to try and eradicate biases in criminal prison sentencing procedures by offering a less-biased method to determine a defendant's prison sentence.

II. Dataset¶

Detail¶

A Federal Criminal Sentences database from 2018-2019 was obtained from the National Archive of Criminal Justice Data (NACJD) to observe the following features for a given defendant:

  • AGE (defendant's age at time of sentencing)
  • CITIZEN (nature of defendant's citizenship)
  • MONRACE (defendant's race)
  • HISPORIG (defendant's ethnic origin)
  • MONSEX (defendant's gender)
  • TOTPRISN (total months of imprisonment ordered)
  • OFFGUIDE (primary type of crime for the case)
  • NOCOUNTS (number of counts of conviction)
  • TOTCHPTS (total number of criminal history points applied)
AGE CITIZEN MONRACE HISPORIG MONSEX TOTPRISN OFFGUIDE NOCOUNTS TOTCHPTS
0 34 (3) Illegal alien (01) White / Caucasian (2) Hispanic (0) Male 30 (10) Drug Trafficking 1 0
1 36 (3) Illegal alien (01) White / Caucasian (2) Hispanic (0) Male 0 (17) Immigration 1 1
2 50 (1) United States citizen (01) White / Caucasian (2) Hispanic (0) Male 18 (17) Immigration 1 0
3 41 (3) Illegal alien (01) White / Caucasian (2) Hispanic (0) Male 21 (17) Immigration 1 0
4 27 (3) Illegal alien (01) White / Caucasian (2) Hispanic (0) Male 18 (17) Immigration 1 0

For qualitative variables, we will use the assigned numbers in place of Strings to allow for regression. We will use this data (besides race, ethnicity, citizenship, and gender) to develop a regression model as we attempt to discover and identify any underlying correlations or biases in the data.

Potential Problems¶

As mentioned above, due to the complex nature of criminal proceedings and the intricacy of specifics for every case, the model will be difficult to perfect, but at least for simpler court cases, the model may be able to accurately estimate defendants' prison sentences.

Method¶

The problem will be assessed with a regression model. Given the features shown above (where qualitative variables are replaced by their corresponding numbers, not including race, ethnicity, citizenship, or gender), the model will estimate the defendant's prison sentence.