Predicting Risk for Mental Illness¶

For this project, I would like to use machine learning to predict risk for mental health disorder based on education, income level, and symptoms. Mental Health issues have become very prevelant in recent years and there has been a lot of talk around what we should do in order to reduce the amount of people affected by them. I think it would be really interesting to use machine learning to evaluate if someone may be at risk for a mental health issue based on education and socioeconomic status

rising mental health issues

current mental health x machine learning projects

data set

In [52]:
import pandas as pd

data_df = pd.read_excel('Cleaned Data.xlsx')
data_df
Out[52]:
I am currently employed at least part-time I identify as having a mental illness Education I have my own computer separate from a smart phone I have been hospitalized before for my mental illness How many days were you hospitalized for your mental illness I am legally disabled I have my regular access to the internet I live with my parents I have a gap in my resume ... Obsessive thinking Mood swings Panic attacks Compulsive behavior Tiredness Age Gender Household Income Region Device Type
0 0 0 High School or GED 0 0 0.0 0 1 0 1 ... 1.0 0.0 1.0 0.0 0.0 30-44 Male $25,000-$49,999 Mountain Android Phone / Tablet
1 1 1 Some Phd 1 0 0.0 0 1 0 0 ... 0.0 0.0 1.0 0.0 1.0 18-29 Male $50,000-$74,999 East South Central MacOS Desktop / Laptop
2 1 0 Completed Undergraduate 1 0 0.0 0 1 0 0 ... 0.0 0.0 0.0 0.0 0.0 30-44 Male $150,000-$174,999 Pacific MacOS Desktop / Laptop
3 0 0 Some Undergraduate 1 0 NaN 0 1 1 1 ... 0.0 0.0 0.0 0.0 0.0 30-44 Male $25,000-$49,999 New England Windows Desktop / Laptop
4 1 1 Completed Undergraduate 1 1 35.0 1 1 0 1 ... 1.0 1.0 1.0 1.0 1.0 30-44 Male $25,000-$49,999 East North Central iOS Phone / Tablet
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
329 0 0 High School or GED 1 0 NaN 1 1 0 0 ... 0.0 0.0 0.0 0.0 1.0 45-60 Female Prefer not to answer Mountain Android Phone / Tablet
330 1 0 Some Undergraduate 1 0 0.0 0 1 1 0 ... 0.0 0.0 0.0 0.0 0.0 18-29 Male $50,000-$74,999 Pacific Windows Desktop / Laptop
331 1 0 Some Undergraduate 1 0 0.0 0 1 0 0 ... 0.0 0.0 0.0 0.0 0.0 > 60 Female $10,000-$24,999 West North Central Windows Desktop / Laptop
332 0 1 Some Undergraduate 0 1 1.0 1 1 1 1 ... 1.0 1.0 1.0 1.0 1.0 18-29 Female $0-$9,999 West South Central Android Phone / Tablet
333 1 1 Some Undergraduate 1 0 0.0 1 1 0 0 ... NaN NaN NaN NaN NaN 18-29 Female $10,000-$24,999 Pacific Android Phone / Tablet

334 rows × 31 columns

In [51]:
data_dict = {'I am currently employed at least part-time': 'employment status', 'I identify as having a mental illness': '0=no , 1=yes','Education': 'level of education completed', 'I have my own computer separate from a smart phone': '0=no, 1=yes','I have been hospitalized before for my mental illness': '0=no, 1=yes','How many days were you hospitalized for your mental illness': 'days spent in hospital', 'I am legally disabled': '0=no, 1=yes', 'I have my regular access to the internet': '0=no, 1=yes', 'I live with my parents': '0=no, 1=yes', 'I have a gap in my resume': 'gaps in resume due to MH 0=no, 1=yes', 'Total length of any gaps in my resume in months.': 'length of gap due to MH', 'Annual income (including any social welfare programs) in USD': 'income range in thousands? not sure', 'I am unemployed': 'employment status', 'I read outside of work and school': '0=no, 1=yes', 'Annual income from social welfare programs': 'in thousands? over what period of time?', 'I receive food stamps': '0=no, 1=yes', 'I am on section 8 housing': '0=no, 1=yes', 'How many times were you hospitalized for your mental illness': 'number of hospitalizations', 'Lack of concentration': '0=no, 1=yes', 'Anxiety': '0=no, 1=yes', 'Depression': '0=no, 1=yes', 'Obsessive thinking': '0=no, 1=yes', 'Mood swings': '0=no, 1=yes', 'Panic attacks': '0=no, 1=yes', 'Compulsive behavior':'0=no, 1=yes', 'Tiredness': '0=no, 1=yes','Age': 'age', 'Gender': 'Male or Female', 'Household Income': 'income range', 'Region': 'Which part of US', 'Device Type': 'Andriod, Windows, Mac'}  
         
data_dict
Out[51]:
{'I am currently employed at least part-time': 'employment status',
 'I identify as having a mental illness': '0=no , 1=yes',
 'Education': 'level of education completed',
 'I have my own computer separate from a smart phone': '0=no, 1=yes',
 'I have been hospitalized before for my mental illness': '0=no, 1=yes',
 'How many days were you hospitalized for your mental illness': 'days spent in hospital',
 'I am legally disabled': '0=no, 1=yes',
 'I have my regular access to the internet': '0=no, 1=yes',
 'I live with my parents': '0=no, 1=yes',
 'I have a gap in my resume': 'gaps in resume due to MH 0=no, 1=yes',
 'Total length of any gaps in my resume in\xa0months.': 'length of gap due to MH',
 'Annual income (including any social welfare programs) in USD': 'income range in thousands? not sure',
 'I am unemployed': 'employment status',
 'I read outside of work and school': '0=no, 1=yes',
 'Annual income from social welfare programs': 'in thousands? over what period of time?',
 'I receive food stamps': '0=no, 1=yes',
 'I am on section 8 housing': '0=no, 1=yes',
 'How many times were you hospitalized for your mental illness': 'number of hospitalizations',
 'Lack of concentration': '0=no, 1=yes',
 'Anxiety': '0=no, 1=yes',
 'Depression': '0=no, 1=yes',
 'Obsessive thinking': '0=no, 1=yes',
 'Mood swings': '0=no, 1=yes',
 'Panic attacks': '0=no, 1=yes',
 'Compulsive behavior': '0=no, 1=yes',
 'Tiredness': '0=no, 1=yes',
 'Age': 'age',
 'Gender': 'Male or Female',
 'Household Income': 'income range',
 'Region': 'Which part of US',
 'Device Type': 'Andriod, Windows, Mac'}

I am planning to cluster the data by income range. This will allow for analysis of Mental Health risks based on income level. I would also consider using cross validation methods to consider different segments of the data

In [ ]: