music clustering¶

Problem:¶

Music has been shown to have positive effects on mental health, and there is thus motivation to explore this correlation on a deeper level. Data science may provide further insight into the specific effects of different music genres and listening experiences on mental health (namely anxiety, depression, ocd, and insomnia). A Kaggle dataset will be used for this exploration.

Solution:¶

Cluster genres more frequently listened to by users with specific mental health disorders that cited improvement vs. no difference. Similarly, cluster by listening experiences (e.g. hours per day, while working). Doing so may allow us to elucidate which genres and listening habits seem to have positive impacts on specific mental health disorders.

Impact:¶

Ultimately, understanding the specific ways music impacts mental health can inform music therapy and help people manage their mental health.

Relevant sources:

  • https://pubmed.ncbi.nlm.nih.gov/26066780/
  • https://www.talkspace.com/mental-health/conditions/articles/music-for-anxiety-management/

Dataset:¶

Data collected via Google Forms survey¶

Block 0: Background¶

  • respondents answer questions about their listening preferences

Block 1: Music genres¶

  • respondents answer questions about the frequency they listen to specific music genres
    • Never
    • Rarely
    • Sometimes
    • Very frequently

Block 2: Mental health¶

  • respondents rank their anxiety, depression, insomnia, and ocd levels on a scale of 0-10
    • 0 - I do not experience this.
    • 10 - I experience this regularly, constantly/or to an extreme.
  • respondents determine if music has an effect on their mental health
    • Improve
    • None
In [4]:
import pandas as pd 

df_music = pd.read_csv('mxmh_survey_results.csv')
df_music.head()
Out[4]:
Timestamp Age Primary streaming service Hours per day While working Instrumentalist Composer Fav genre Exploratory Foreign languages ... Frequency [R&B] Frequency [Rap] Frequency [Rock] Frequency [Video game music] Anxiety Depression Insomnia OCD Music effects Permissions
0 8/27/2022 19:29:02 18.0 Spotify 3.0 Yes Yes Yes Latin Yes Yes ... Sometimes Very frequently Never Sometimes 3.0 0.0 1.0 0.0 NaN I understand.
1 8/27/2022 19:57:31 63.0 Pandora 1.5 Yes No No Rock Yes No ... Sometimes Rarely Very frequently Rarely 7.0 2.0 2.0 1.0 NaN I understand.
2 8/27/2022 21:28:18 18.0 Spotify 4.0 No No No Video game music No Yes ... Never Rarely Rarely Very frequently 7.0 7.0 10.0 2.0 No effect I understand.
3 8/27/2022 21:40:40 61.0 YouTube Music 2.5 Yes No Yes Jazz Yes Yes ... Sometimes Never Never Never 9.0 7.0 3.0 3.0 Improve I understand.
4 8/27/2022 21:54:47 18.0 Spotify 4.0 Yes No No R&B Yes No ... Very frequently Very frequently Never Rarely 7.0 2.0 5.0 9.0 Improve I understand.

5 rows × 33 columns

In [26]:
columns_list = df_music.columns
columns_list

block_0 = columns_list[:11]
block_1 = columns_list[11:27]
block_2 = columns_list[27:33]

print(f'Block 0: {block_0},\n\n Block 1: {block_1},\n\n Block 2: {block_2}')
Block 0: Index(['Timestamp', 'Age', 'Primary streaming service', 'Hours per day',
       'While working', 'Instrumentalist', 'Composer', 'Fav genre',
       'Exploratory', 'Foreign languages', 'BPM'],
      dtype='object'),

 Block 1: Index(['Frequency [Classical]', 'Frequency [Country]', 'Frequency [EDM]',
       'Frequency [Folk]', 'Frequency [Gospel]', 'Frequency [Hip hop]',
       'Frequency [Jazz]', 'Frequency [K pop]', 'Frequency [Latin]',
       'Frequency [Lofi]', 'Frequency [Metal]', 'Frequency [Pop]',
       'Frequency [R&B]', 'Frequency [Rap]', 'Frequency [Rock]',
       'Frequency [Video game music]'],
      dtype='object'),

 Block 2: Index(['Anxiety', 'Depression', 'Insomnia', 'OCD', 'Music effects',
       'Permissions'],
      dtype='object')
In [ ]: