heart failure¶

Research has suggested that heart failure has become a global health hazard that is projected to increase dramatically over the next decade. This number is expected to rise by at least 46% by 2030, an almost double projection from the last decade. This increase can be attributed to an increase in the amount of people living with heart disease and diabetes. Despite these health factors, there needs to be a discussion based on other socioeconomic factors such as income, education level, race, and geographic location. With this increase of heart disease, heart failure has also increased because of strokes and high blood pressure. In order to combat this problem an analysis needs to be conducted on who is most vulnerable and what factors are the main targets to decrease the likelihood of this disease.

Some supporting links are listed below: -link one -link two

link to dataset

In [19]:
import pandas as pd


def read_data_dict(filename, type_cast_dict = {}):
    file = open(filename, "r")
    data = []
   
    headers = file.readline()
    headers = headers.strip().split(",")
     
    for line in file:
        pieces = line.strip().split(",")
        
        row_dict = {}
        # go through each column and link the value
        # to the appropriate header
        for i in range(len(pieces)):
        
            if headers[i] in type_cast_dict:
                cast_func = type_cast_dict[headers[i]]
                row_dict[headers[i]] = cast_func(pieces[i])
            else:
                row_dict[headers[i]] = pieces[i]
                
        data.append(row_dict)
        
    return data
In [22]:
data = read_data_dict('heart_failure_clinical_records_dataset.csv')
data[0:5]
Out[22]:
[{'age': '75',
  'anaemia': '0',
  'creatinine_phosphokinase': '582',
  'diabetes': '0',
  'ejection_fraction': '20',
  'high_blood_pressure': '1',
  'platelets': '265000',
  'serum_creatinine': '1.9',
  'serum_sodium': '130',
  'sex': '1',
  'smoking': '0',
  'time': '4',
  'DEATH_EVENT': '1'},
 {'age': '55',
  'anaemia': '0',
  'creatinine_phosphokinase': '7861',
  'diabetes': '0',
  'ejection_fraction': '38',
  'high_blood_pressure': '0',
  'platelets': '263358.03',
  'serum_creatinine': '1.1',
  'serum_sodium': '136',
  'sex': '1',
  'smoking': '0',
  'time': '6',
  'DEATH_EVENT': '1'},
 {'age': '65',
  'anaemia': '0',
  'creatinine_phosphokinase': '146',
  'diabetes': '0',
  'ejection_fraction': '20',
  'high_blood_pressure': '0',
  'platelets': '162000',
  'serum_creatinine': '1.3',
  'serum_sodium': '129',
  'sex': '1',
  'smoking': '1',
  'time': '7',
  'DEATH_EVENT': '1'},
 {'age': '50',
  'anaemia': '1',
  'creatinine_phosphokinase': '111',
  'diabetes': '0',
  'ejection_fraction': '20',
  'high_blood_pressure': '0',
  'platelets': '210000',
  'serum_creatinine': '1.9',
  'serum_sodium': '137',
  'sex': '1',
  'smoking': '0',
  'time': '7',
  'DEATH_EVENT': '1'},
 {'age': '65',
  'anaemia': '1',
  'creatinine_phosphokinase': '160',
  'diabetes': '1',
  'ejection_fraction': '20',
  'high_blood_pressure': '0',
  'platelets': '327000',
  'serum_creatinine': '2.7',
  'serum_sodium': '116',
  'sex': '0',
  'smoking': '0',
  'time': '8',
  'DEATH_EVENT': '1'}]

Data Features¶

Each line in the list of dictionaries, represents a person afflicted with heart disease. Each person has their age, their sex, if they smoke or not, if they died or not, if they have diabtetes, and other health factors accounted for to determine what lead to their heart disease diagnosis.

Data Usage¶

My main usage of the data would be to create a machine learning model to predict the mortality caused by heart failure to see who is most at risk and which factors contribute to the likelihood of heart failure the most. In this I will be able to deduce what patients need to watch out for the most, whether that be smoking or high blood pressure.

In [ ]: