I want to analyze the impact of social media on mental health. According to a recent survey, 62% of young adults say that their mental health has been negatively impacted due to social media. By analyzing data on social media usage and mental health indicators such as depression and anxiety rates, we can better understand the correlation between the two and identify potential factors that contribute to poor mental health. This could ultimately lead to the development of strategies for social media companies to improve their platforms to promote positive mental health.
Links:
We can two datasets from the Pew Research Center's 2018 & 2022 study on Teens, Social Media and Technology. This dataset contains information on social media usage and technology among teens in the United States. Utilizing two datasets will act as an additional variable for us to look at when comparing data, for example a trend I was able to identify was the rise in the number of young adults stating that their mental health had been negatively been impacted by social media. (53% -> 62%)
The dataset is as follows:
import pandas as pd
# Read the CSV file into a Pandas DataFrame
df = pd.read_csv('pew_2018.csv')
# Extract the header row as a list
header = list(df.columns)
# Print the header row
print(header)
['CASEID', 'SURV_LANG', 'FITIN', 'FRIEND1', 'FRIEND2', 'FRIEND3', 'FRIEND4_1', 'FRIEND4_2', 'FRIEND4_3', 'FRIEND4_4', 'FRIEND4_5', 'FRIEND4_6', 'FRIEND5', 'FRIEND6_1', 'FRIEND6_2', 'FRIEND6_3', 'FRIEND6_4', 'FRIEND6_5', 'FRIEND6_6', 'FRIEND6_7', 'DEVICEA', 'DEVICEB', 'DEVICEC', 'DEVICED', 'HOMEWORKA', 'HOMEWORKB', 'HOMEWORKC', 'INTREQ', 'GAMING', 'SNS1_1', 'SNS1_2', 'SNS1_3', 'SNS1_4', 'SNS1_5', 'SNS1_6', 'SNS1_7', 'SNS1_8', 'SNS2', 'SOC1', 'SOC1A_GOOD_1', 'SOC1A_GOOD_2', 'SOC1A_GOOD_3', 'SOC1A_GOOD_4', 'SOC1A_GOOD_5', 'SOC1A_GOOD_6', 'SOC1A_GOOD_7', 'SOC1A_BAD_1', 'SOC1A_BAD_2', 'SOC1A_BAD_3', 'SOC1A_BAD_4', 'SOC1A_BAD_5', 'SOC1A_BAD_6', 'SOC1A_BAD_7', 'SOC1A_OTHER', 'SOC1A_DK_REF', 'POST1A', 'POST1B', 'POST1C', 'POST1D', 'POST1E', 'POST2_1', 'POST2_2', 'POST2_3', 'POST2_4', 'POST2_5', 'POST2_6', 'POST2_7', 'POST2_8', 'SOC2POSA', 'SOC2POSB', 'SOC2POSC', 'SOC2POSD', 'SOC2NEGA', 'SOC2NEGB', 'SOC2NEGC', 'SOC2NEGD', 'SOC4A', 'SOC4B', 'SOC4C', 'SOC4D', 'SOC5A', 'SOC5B', 'SOC5C', 'SOC6', 'SOC7_1', 'SOC7_2', 'SOC7_3', 'SOC7_4', 'SOC7_5', 'SOC7_6', 'SOCEXPA', 'SOCEXPB', 'SOCEXPC', 'SOCEXPD', 'WORRYA', 'WORRYB', 'WORRYC', 'LIMITA', 'LIMITB', 'LIMITC', 'CELL1_1', 'CELL1_2', 'CELL1_3', 'CELL1_4', 'CELL1_5', 'CELL1_6', 'CELL2A', 'CELL2B', 'CELL2C', 'CELL2D', 'CELL3A', 'CELL3B', 'CELL3C', 'DISTRACT', 'GROUP1', 'GROUP2_1', 'GROUP2_2', 'GROUP2_3', 'GROUP2_4', 'GROUP2_5', 'GROUP2_6', 'GROUP2_7', 'GROUP2_8', 'GROUP2_9', 'GROUP2_10', 'GROUP2_11', 'GROUP3A', 'GROUP3B', 'GROUP3C', 'GROUP3D', 'OH1A', 'OH1B', 'OH1C', 'OH1D', 'OH2A', 'OH2B', 'OH2C', 'OH2D', 'OH2E', 'OH2F', 'OH3_1', 'OH3_2', 'OH3_3', 'OH3_4', 'OH3_5', 'OH3_6', 'OH3_7', 'GUN1', 'GUN2A', 'GUN2B', 'GUN2C', 'GUN2D', 'GUN2E', 'GENDER', 'AGE', 'P_EDUC', 'RACETHNICITY', 'HOME_TYPE', 'HOUSING', 'INCOME', 'INTERNET', 'PHONESERVICE', 'METRO', 'REGION4', 'HHSIZE', 'HH01', 'HH25', 'HH612', 'HH1317', 'HH18OV', 'CO_DATE', 'DURATION', 'SURV_MODE', 'MODE_END', 'DEVICE', 'WEIGHT']
We can use machine learning methods such as clustering and classification to identify patterns and trends in social media usage and mental health indicators. For example, we can cluster teens based on their social media use and compare the mental health indicators across different clusters to identify any significant differences. Additionally, we can use classification algorithms to predict the mental health status of a teen based on their social media usage patterns. These approaches can provide valuable insights into the relationship between social media use and mental health among teens.