There are many types of heart disease including diseased vessels, structural problems, and blood clots (Google). The most common one is conronary arttery disease, which the damage in the blood vessel would lead to heart attack. According to the NY Department of Health, about 697,000 people die of heart diseases ever year in the US. In addition, about 805,000 Americans have heart attacks every year.
By applying data science techniques to medical datas, we can predict the likelihood of whether or not a person will have a heart disease based on their medical examination data.
Please go to the Department of Health website to see more stats.
import pandas as pd
df_heart = pd.read_csv('cardio_train.csv', sep=';')
df_heart
id | age | gender | height | weight | ap_hi | ap_lo | cholesterol | gluc | smoke | alco | active | cardio | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 18393 | 2 | 168 | 62.0 | 110 | 80 | 1 | 1 | 0 | 0 | 1 | 0 |
1 | 1 | 20228 | 1 | 156 | 85.0 | 140 | 90 | 3 | 1 | 0 | 0 | 1 | 1 |
2 | 2 | 18857 | 1 | 165 | 64.0 | 130 | 70 | 3 | 1 | 0 | 0 | 0 | 1 |
3 | 3 | 17623 | 2 | 169 | 82.0 | 150 | 100 | 1 | 1 | 0 | 0 | 1 | 1 |
4 | 4 | 17474 | 1 | 156 | 56.0 | 100 | 60 | 1 | 1 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
69995 | 99993 | 19240 | 2 | 168 | 76.0 | 120 | 80 | 1 | 1 | 1 | 0 | 1 | 0 |
69996 | 99995 | 22601 | 1 | 158 | 126.0 | 140 | 90 | 2 | 2 | 0 | 0 | 1 | 1 |
69997 | 99996 | 19066 | 2 | 183 | 105.0 | 180 | 90 | 3 | 1 | 0 | 1 | 0 | 1 |
69998 | 99998 | 22431 | 1 | 163 | 72.0 | 135 | 80 | 1 | 2 | 0 | 0 | 0 | 1 |
69999 | 99999 | 20540 | 1 | 170 | 72.0 | 120 | 80 | 2 | 1 | 0 | 0 | 1 | 0 |
70000 rows × 13 columns
id (int) - unique identifier of patient
age (int) - age in days
height (int) - height in cm
weight (float) - weight in kg
gender - gender (1: female; 2: male)
ap_hi (int) - Systolic blood pressure
ap_lo (int) - Diastolic blood pressure
cholesterol - Cholesterol (1: normal, 2: above normal, 3: well above normal)
gluc - Glucose (1: normal, 2: above normal, 3: well above normal)
smoke - Smoking (whether or not patient smokes) (0: no; 1: yes)
alco - Alcohol intake (whether or not patient drinks) (0: no; 1: yes)
active - Physical activity (whether or not patient are active) (0: no; 1: yes)
cardio - Presence or absence of cardiovascular disease (0: no; 1: yes)
All of the attributes above contribute to heart diseases.
We will group the data into three categories: smoke, alc, and active, and compare the examination data. Doing so will allow us to discover which elements doctors should pay attention to when examining the patients to prevent possible heart disease.