Labor Market Discrimination¶

Motivation:¶

Problem¶

Despite the significant advancements in racial equality in decades past, different treatment by race continues to perpetuate the U.S. labor market, especially during the hiring process.

Solution¶

This dataset offers an opportunity to observe the impact of race in the labor market, as researchers sent out thousands of fictitious resumes were to help-wanted advertisements in Boston and Chicago. Each individual was characterized by numerous factors, covering education level, skills, and experience as well as objective data like race, age, and gender. The goal of this project is to identify if a relationship exists between race and hirability with respect to the qualifications of the individual.

Impact¶

This work may hold wide implications for the hiring process across all industries, and perhaps elicit a need for reform. I aim to create a classifier which predicts how likely an individual is to get hired based on their associated characteristics. This predictor may point out inconsistencies in the callback process, and determine just how many times a qualified candidate is being passed on just because of race.

Dataset¶

Detail¶

We will use an Open Intro Dataset of Fictitious Job Applications to observe the following features for each individual:

  • education
  • n_jobs (number of jobs listed on resume)
  • years_exp
  • honors
  • volunteer
  • computer_skills
  • special_skills
  • first_name
  • sex
  • race
  • h (1 = high quality resume)
  • l (1 = low quality resume)
  • call (1 = applicant was called back)

(** note: while the data includes many features, those listed above will act as the focus of our study.)

In [1]:
import pandas as pd

# we can read zipped csv files too!
df_labor = pd.read_csv('labor_market_discrimination.csv')
df_labor.head()
Out[1]:
education n_jobs years_exp honors volunteer military emp_holes occup_specific occup_broad work_in_school ... comp_req org_req manuf trans_com bank_real trade bus_service oth_service miss_ind ownership
0 4 2 6 0 0 0 1 17 1 0 ... 1 0 1 0 0 0 0 0 0 NaN
1 3 3 6 0 1 1 0 316 6 1 ... 1 0 1 0 0 0 0 0 0 NaN
2 4 1 6 0 0 0 0 19 1 1 ... 1 0 1 0 0 0 0 0 0 NaN
3 3 4 6 0 1 0 1 313 5 0 ... 1 0 1 0 0 0 0 0 0 NaN
4 3 3 22 0 0 0 0 313 5 1 ... 1 1 0 0 0 0 0 1 0 Nonprofit

5 rows × 63 columns

Method:¶

To assess this problem, I will perform a logistic regression analysis, a common classifier used for binary classification problems. This will be used to predict whether a person with a given set of characteristics is more likely to receive a callback or not. Additionally, I will cluster resumes together based on their characteristics to then identify patterns or similarities among resumes that received callbacks.

** note: While I have not yet conducted a logistic regression in python, I alternatively may use linear regression to predict the number of callbacks an individual might receive based on their characteristics, and then conduct a comparison by race.