(1%) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).
Data science could be exponentially valuable for learning more about suicide rates and how it differs across age ranges and respective home countries. It's really important for us all to explore the statistics around mental health issues and how they affect people worldwide. Data science could help to establish some intervention and prevention strategies, build predictive models, and provide insight and make inferences about the distribution of these statistics. Data science can help us all to gain a better understanding of current suicide rates, information surrounding the current mental health resources available, and help develop effective strategies to prevent them.
There are some recent studies from the CDC and ICF that show how data science can help to reveal trends amonst suicide and mental health issues. In order to learn more and make sense of this issue, data science could provide useful in terms of discovering commonalities amongst worldwide cases.
(1%) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.
import pandas as pd
# establish data dictionary, read, and explore dataset
Column Name | Description |
---|---|
Psychiatrists | # working in mental health sector (per 100,000 population) |
Nurses | # working in mental health sector (per 100,000 population) |
Social Workers | # working in mental health sector (per 100,000 population) |
Psychologists | # working in mental health sector (per 100,000 population) |
df_resources = pd.read_csv('human_resources.csv')
print(df_resources.head())
Country Year Psychiatrists Nurses Social_workers \ 0 Afghanistan 2016 0.231 0.098 NaN 1 Albania 2016 1.471 6.876 1.060 2 Angola 2016 0.057 0.660 0.022 3 Antigua and Barbuda 2016 1.001 7.005 4.003 4 Argentina 2016 21.705 NaN NaN Psychologists 0 0.296 1 1.231 2 0.179 3 NaN 4 222.572
Column Name | Description |
---|---|
Sex | Represents the suicide rates for three different values: people who identify as female, male, or both |
Ages | The rest of these columns represent the age ranges for these suicide rates |
df_crude_rates = pd.read_csv('crude_suicide_rates.csv')
print(df_crude_rates.head())
Country Sex 80_above 70to79 60to69 50to59 40to49 \ 0 Afghanistan Both sexes 42.0 11.0 5.5 5.6 6.6 1 Afghanistan Male 70.4 20.9 9.8 9.3 10.5 2 Afghanistan Female 20.1 2.3 1.4 1.6 2.3 3 Albania Both sexes 16.3 8.3 6.0 7.8 9.1 4 Albania Male 23.2 11.9 8.1 11.4 13.5 30to39 20to29 10to19 0 9.2 10.2 3.1 1 15.1 16.3 4.8 2 2.7 3.5 1.2 3 6.1 6.5 5.0 4 8.8 6.3 3.1
print(df_rates.describe())
--------------------------------------------------------------------------- NameError Traceback (most recent call last) /var/folders/jr/993ryp_95k5ccq72s9k086wm0000gn/T/ipykernel_71732/2050843025.py in <module> ----> 1 print(df_rates.describe()) NameError: name 'df_rates' is not defined
(1%) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do. For example:
Machine learning can be used to gain insights into the data and develop models that can identify individuals who might be at risk for suicide. Algorithms can analyze correlations in teh data to see what factors contribute to these high rates.