Predicting the Possibility of Heart Disease¶

There are many types of heart disease including diseased vessels, structural problems, and blood clots (Google). The most common one is conronary arttery disease, which the damage in the blood vessel would lead to heart attack. According to the NY Department of Health, about 697,000 people die of heart diseases ever year in the US. In addition, about 805,000 Americans have heart attacks every year.

By applying data science techniques to medical datas, we can predict the likelihood of whether or not a person will have a heart disease based on their medical examination data.

Please go to the Department of Health website to see more stats.

In [1]:
import pandas as pd
df_heart = pd.read_csv('cardio_train.csv', sep=';')
In [2]:
df_heart
Out[2]:
id age gender height weight ap_hi ap_lo cholesterol gluc smoke alco active cardio
0 0 18393 2 168 62.0 110 80 1 1 0 0 1 0
1 1 20228 1 156 85.0 140 90 3 1 0 0 1 1
2 2 18857 1 165 64.0 130 70 3 1 0 0 0 1
3 3 17623 2 169 82.0 150 100 1 1 0 0 1 1
4 4 17474 1 156 56.0 100 60 1 1 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
69995 99993 19240 2 168 76.0 120 80 1 1 1 0 1 0
69996 99995 22601 1 158 126.0 140 90 2 2 0 0 1 1
69997 99996 19066 2 183 105.0 180 90 3 1 0 1 0 1
69998 99998 22431 1 163 72.0 135 80 1 2 0 0 0 1
69999 99999 20540 1 170 72.0 120 80 2 1 0 0 1 0

70000 rows × 13 columns

Data Dictionary¶

id (int) - unique identifier of patient

age (int) - age in days

height (int) - height in cm

weight (float) - weight in kg

gender - gender (1: female; 2: male)

ap_hi (int) - Systolic blood pressure

ap_lo (int) - Diastolic blood pressure

cholesterol - Cholesterol (1: normal, 2: above normal, 3: well above normal)

gluc - Glucose (1: normal, 2: above normal, 3: well above normal)

smoke - Smoking (whether or not patient smokes) (0: no; 1: yes)

alco - Alcohol intake (whether or not patient drinks) (0: no; 1: yes)

active - Physical activity (whether or not patient are active) (0: no; 1: yes)

cardio - Presence or absence of cardiovascular disease (0: no; 1: yes)

All of the attributes above contribute to heart diseases.

Usage of Data¶

We will group the data into three categories: smoke, alc, and active, and compare the examination data. Doing so will allow us to discover which elements doctors should pay attention to when examining the patients to prevent possible heart disease.