music clustering¶

Problem:¶

Music has been shown to have positive effects on mental health, and there is thus motivation to explore this correlation on a deeper level. Data science may provide further insight into the specific effects of different music genres and listening experiences on mental health (namely anxiety, depression, ocd, and insomnia). A Kaggle dataset will be used for this exploration.

Solution:¶

Cluster genres more frequently listened to by users with specific mental health disorders that cited improvement vs. no difference. Similarly, cluster by listening experiences (e.g. hours per day, while working). Doing so may allow us to elucidate which genres and listening habits seem to have positive impacts on specific mental health disorders.

Impact:¶

Ultimately, understanding the specific ways music impacts mental health can inform music therapy and help people manage their mental health.

Relevant sources:

Dataset:¶

Data collected via Google Forms survey¶

Block 0: Background¶

respondents answer questions about their listening preferences

Block 1: Music genres¶

respondents answer questions about the frequency they listen to specific music genres
- Never
- Rarely
- Sometimes
- Very frequently

Block 2: Mental health¶

respondents rank their anxiety, depression, insomnia, and ocd levels on a scale of 0-10
- 0 - I do not experience this.
- 10 - I experience this regularly, constantly/or to an extreme.
respondents determine if music has an effect on their mental health
- Improve
- None

In [4]:

import pandas as pd 

df_music = pd.read_csv('mxmh_survey_results.csv')
df_music.head()

Out[4]:

	Timestamp	Age	Primary streaming service	Hours per day	While working	Instrumentalist	Composer	Fav genre	Exploratory	Foreign languages	...	Frequency [R&B]	Frequency [Rap]	Frequency [Rock]	Frequency [Video game music]	Anxiety	Depression	Insomnia	OCD	Music effects	Permissions
0	8/27/2022 19:29:02	18.0	Spotify	3.0	Yes	Yes	Yes	Latin	Yes	Yes	...	Sometimes	Very frequently	Never	Sometimes	3.0	0.0	1.0	0.0	NaN	I understand.
1	8/27/2022 19:57:31	63.0	Pandora	1.5	Yes	No	No	Rock	Yes	No	...	Sometimes	Rarely	Very frequently	Rarely	7.0	2.0	2.0	1.0	NaN	I understand.
2	8/27/2022 21:28:18	18.0	Spotify	4.0	No	No	No	Video game music	No	Yes	...	Never	Rarely	Rarely	Very frequently	7.0	7.0	10.0	2.0	No effect	I understand.
3	8/27/2022 21:40:40	61.0	YouTube Music	2.5	Yes	No	Yes	Jazz	Yes	Yes	...	Sometimes	Never	Never	Never	9.0	7.0	3.0	3.0	Improve	I understand.
4	8/27/2022 21:54:47	18.0	Spotify	4.0	Yes	No	No	R&B	Yes	No	...	Very frequently	Very frequently	Never	Rarely	7.0	2.0	5.0	9.0	Improve	I understand.

5 rows × 33 columns

In [26]:

columns_list = df_music.columns
columns_list

block_0 = columns_list[:11]
block_1 = columns_list[11:27]
block_2 = columns_list[27:33]

print(f'Block 0: {block_0},\n\n Block 1: {block_1},\n\n Block 2: {block_2}')

Block 0: Index(['Timestamp', 'Age', 'Primary streaming service', 'Hours per day',
       'While working', 'Instrumentalist', 'Composer', 'Fav genre',
       'Exploratory', 'Foreign languages', 'BPM'],
      dtype='object'),

 Block 1: Index(['Frequency [Classical]', 'Frequency [Country]', 'Frequency [EDM]',
       'Frequency [Folk]', 'Frequency [Gospel]', 'Frequency [Hip hop]',
       'Frequency [Jazz]', 'Frequency [K pop]', 'Frequency [Latin]',
       'Frequency [Lofi]', 'Frequency [Metal]', 'Frequency [Pop]',
       'Frequency [R&B]', 'Frequency [Rap]', 'Frequency [Rock]',
       'Frequency [Video game music]'],
      dtype='object'),

 Block 2: Index(['Anxiety', 'Depression', 'Insomnia', 'OCD', 'Music effects',
       'Permissions'],
      dtype='object')

In [ ]: