129

(1%) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).

Data science could be exponentially valuable for learning more about suicide rates and how it differs across age ranges and respective home countries. It's really important for us all to explore the statistics around mental health issues and how they affect people worldwide. Data science could help to establish some intervention and prevention strategies, build predictive models, and provide insight and make inferences about the distribution of these statistics. Data science can help us all to gain a better understanding of current suicide rates, information surrounding the current mental health resources available, and help develop effective strategies to prevent them.

There are some recent studies from the CDC and ICF that show how data science can help to reveal trends amonst suicide and mental health issues. In order to learn more and make sense of this issue, data science could provide useful in terms of discovering commonalities amongst worldwide cases.

Works Cited

https://www.cdc.gov/surveillance/blogs-stories/Suicide-Trends.html

https://www.icf.com/insights/health/data-science-mental-health-emergency#:~:text=Data%20science%20can%20help%20better,active%20mental%20illness%20take%20root.

(1%) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.

In [1]:

import pandas as pd

# establish data dictionary, read, and explore dataset

Column Name	Description
Psychiatrists	# working in mental health sector (per 100,000 population)
Nurses	# working in mental health sector (per 100,000 population)
Social Workers	# working in mental health sector (per 100,000 population)
Psychologists	# working in mental health sector (per 100,000 population)

In [2]:

df_resources = pd.read_csv('human_resources.csv')
print(df_resources.head())

               Country  Year  Psychiatrists  Nurses  Social_workers  \
0          Afghanistan  2016          0.231   0.098             NaN   
1              Albania  2016          1.471   6.876           1.060   
2               Angola  2016          0.057   0.660           0.022   
3  Antigua and Barbuda  2016          1.001   7.005           4.003   
4            Argentina  2016         21.705     NaN             NaN   

   Psychologists  
0          0.296  
1          1.231  
2          0.179  
3            NaN  
4        222.572

Column Name	Description
Sex	Represents the suicide rates for three different values: people who identify as female, male, or both
Ages	The rest of these columns represent the age ranges for these suicide rates

In [3]:

df_crude_rates = pd.read_csv('crude_suicide_rates.csv')
print(df_crude_rates.head())

       Country          Sex   80_above   70to79   60to69    50to59    40to49  \
0  Afghanistan   Both sexes       42.0     11.0       5.5       5.6      6.6   
1  Afghanistan         Male       70.4     20.9       9.8       9.3     10.5   
2  Afghanistan       Female       20.1      2.3       1.4       1.6      2.3   
3      Albania   Both sexes       16.3      8.3       6.0       7.8      9.1   
4      Albania         Male       23.2     11.9       8.1      11.4     13.5   

    30to39   20to29   10to19  
0      9.2     10.2      3.1  
1     15.1     16.3      4.8  
2      2.7      3.5      1.2  
3      6.1      6.5      5.0  
4      8.8      6.3      3.1

In [4]:

print(df_rates.describe())

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/var/folders/jr/993ryp_95k5ccq72s9k086wm0000gn/T/ipykernel_71732/2050843025.py in <module>
----> 1 print(df_rates.describe())

NameError: name 'df_rates' is not defined

(1%) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do. For example:

Machine learning can be used to gain insights into the data and develop models that can identify individuals who might be at risk for suicide. Algorithms can analyze correlations in teh data to see what factors contribute to these high rates.

mental health¶