https://www.themedicalcareblog.com/insurance-based-discrimination/
Access to affordable and quality healthcare has been a top priority for many American voters in recent years. However, there are a lot of obstacles to achieving this goal and the healthcare system is far from perfect. One of the main problems is unfair health insurance policies that discriminate based on race, gender identity, sexual orientation, age, and wellbeing. Evening out the playing field in health insurance is the first step towards improving medical care in America.
import pandas as pd
df = pd.read_csv('insurance_data.csv')
d = df.to_dict()
d.keys()
dict_keys(['index', 'PatientID', 'age', 'gender', 'bmi', 'bloodpressure', 'diabetic', 'children', 'smoker', 'region', 'claim'])
index: order in the data set (starting from 0) PatientID: order in the data set (starting from 1) age: patient's age in years bmi: patient's body mass index, calculated using height and weight bloodpressure: patient's systolic blood pressure (upper number) in milimeters of mercury diabetic: indicates whether or not the patient is diabetic children: amount of children the patient has smoker: indicates whether or not the patient is a smoker region: geographical region of the patient in the United States claim: amount of the insurance claim in dollars
df.head()
index | PatientID | age | gender | bmi | bloodpressure | diabetic | children | smoker | region | claim | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 39.0 | male | 23.2 | 91 | Yes | 0 | No | southeast | 1121.87 |
1 | 1 | 2 | 24.0 | male | 30.1 | 87 | No | 0 | No | southeast | 1131.51 |
2 | 2 | 3 | NaN | male | 33.3 | 82 | Yes | 0 | No | southeast | 1135.94 |
3 | 3 | 4 | NaN | male | 33.7 | 80 | No | 0 | No | northwest | 1136.40 |
4 | 4 | 5 | NaN | male | 34.1 | 100 | No | 0 | No | northwest | 1137.01 |
In order to better understand inequities in the health insurance process, we will use machine learning methods to predict who receives insurance claims. We will look at demographics, for example, clustering age and gender to see if certain groups receive higher rates than others, as well as health factors such as bloodpressure, diabetic status, and smoker status and determine if advserse health conditions impact insurance claims.