Insurance Claim Analysis: Demographic and Health¶

Shawn Lokshin¶

https://www.themedicalcareblog.com/insurance-based-discrimination/

Access to affordable and quality healthcare has been a top priority for many American voters in recent years. However, there are a lot of obstacles to achieving this goal and the healthcare system is far from perfect. One of the main problems is unfair health insurance policies that discriminate based on race, gender identity, sexual orientation, age, and wellbeing. Evening out the playing field in health insurance is the first step towards improving medical care in America.

In [2]:
import pandas as pd
df = pd.read_csv('insurance_data.csv')
d = df.to_dict()
d.keys()
Out[2]:
dict_keys(['index', 'PatientID', 'age', 'gender', 'bmi', 'bloodpressure', 'diabetic', 'children', 'smoker', 'region', 'claim'])

index: order in the data set (starting from 0) PatientID: order in the data set (starting from 1) age: patient's age in years bmi: patient's body mass index, calculated using height and weight bloodpressure: patient's systolic blood pressure (upper number) in milimeters of mercury diabetic: indicates whether or not the patient is diabetic children: amount of children the patient has smoker: indicates whether or not the patient is a smoker region: geographical region of the patient in the United States claim: amount of the insurance claim in dollars

In [3]:
df.head()
Out[3]:
index PatientID age gender bmi bloodpressure diabetic children smoker region claim
0 0 1 39.0 male 23.2 91 Yes 0 No southeast 1121.87
1 1 2 24.0 male 30.1 87 No 0 No southeast 1131.51
2 2 3 NaN male 33.3 82 Yes 0 No southeast 1135.94
3 3 4 NaN male 33.7 80 No 0 No northwest 1136.40
4 4 5 NaN male 34.1 100 No 0 No northwest 1137.01

In order to better understand inequities in the health insurance process, we will use machine learning methods to predict who receives insurance claims. We will look at demographics, for example, clustering age and gender to see if certain groups receive higher rates than others, as well as health factors such as bloodpressure, diabetic status, and smoker status and determine if advserse health conditions impact insurance claims.