Song Reccomendation Generator¶

Description and Motivation:¶

Problem:¶

Since the pandemic started in 2020, the music streaming population has increased by 26.4% to 523.9 million people. Evidently, music is important to a large group of people, but it is often hard to finda new song to listen to or other songs that are similar to ones you already know you like.

Solution:¶

This generator will take in one of your favorite songs and will produce a new song for you to listen to that has a similar feel to it.

¶

Dataset¶

I am using the Kaggle Spotify DataSets which includes the top 100 songs from 2017 and shows these aspects of each song:

  • id
  • name
  • artists
  • danceability
  • energy
  • key
  • loudness
  • mode
  • speechiness
  • acousticness
  • instrumentalness
  • liveness
  • valence
  • tempo
  • duration_ms
  • time_signature -- time signature of the song (4.0 = 4/4)
In [8]:
data_dict = {'id': 'a unique id only relevant to spotify', 
             'name': 'name of the song',
             'artists': 'artist or artists singing the song',
             'danceability': 'subjective measure of how danceable the song is (0 through 1)',
             'energy': 'subjective measure of how much energy the song gives you (0 thorugh 1)',
             'key': 'numeric int equivalent of the key',
             'loudness': 'how loud the song is (0 being least loud and the more negative the louder)',
             'mode': 'either zero or one',
             'speechiness': 'subjective measure of how much wordiness the song is (0 thorugh 1)',
             'acousticness': 'subjective meassure of how acoustic the song is (0 through 1)',
             'instrumentalness': 'subjective meassure of how instrumental the song is (0 through 1)',
             'liveness': 'subjective meassure of how lively the song is (0 through 1)',
             'valence': 'subjective meassure of how valence the song has (0 through 1)',
             'tempo': 'beats per minute of the song',
             'duration_ms':' duration of the song in miliseconds',
             'time_signature': 'time signature of the song (4.0 = 4/4)'}

data_dict
Out[8]:
{'id': 'a unique id only relevant to spotify',
 'name': 'name of the song',
 'artists': 'artist or artists singing the song',
 'danceability': 'subjective measure of how danceable the song is (0 through 1)',
 'energy': 'subjective measure of how much energy the song gives you (0 thorugh 1)',
 'key': 'numeric int equivalent of the key',
 'loudness': 'how loud the song is (0 being least loud and the more negative the louder)',
 'mode': 'either zero or one',
 'speechiness': 'subjective measure of how much wordiness the song is (0 thorugh 1)',
 'acousticness': 'subjective meassure of how acoustic the song is (0 through 1)',
 'instrumentalness': 'subjective meassure of how instrumental the song is (0 through 1)',
 'liveness': 'subjective meassure of how lively the song is (0 through 1)',
 'valence': 'subjective meassure of how valence the song has (0 through 1)',
 'tempo': 'beats per minute of the song',
 'duration_ms': ' duration of the song in miliseconds',
 'time_signature': 'time signature of the song (4.0 = 4/4)'}
In [4]:
import pandas as pd
df_music = pd.read_csv("spotify2017.csv")
df_music
Out[4]:
id name artists danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo duration_ms time_signature
0 7qiZfU4dY1lWllzX7mPBI Shape of You Ed Sheeran 0.825 0.652 1.0 -3.183 0.0 0.0802 0.5810 0.000000 0.0931 0.9310 95.977 233713.0 4.0
1 5CtI0qwDJkDQGwXD1H1cL Despacito - Remix Luis Fonsi 0.694 0.815 2.0 -4.328 1.0 0.1200 0.2290 0.000000 0.0924 0.8130 88.931 228827.0 4.0
2 4aWmUDTfIPGksMNLV2rQP Despacito (Featuring Daddy Yankee) Luis Fonsi 0.660 0.786 2.0 -4.757 1.0 0.1700 0.2090 0.000000 0.1120 0.8460 177.833 228200.0 4.0
3 6RUKPb4LETWmmr3iAEQkt Something Just Like This The Chainsmokers 0.617 0.635 11.0 -6.769 0.0 0.0317 0.0498 0.000014 0.1640 0.4460 103.019 247160.0 4.0
4 3DXncPQOG4VBw3QHh3S81 I'm the One DJ Khaled 0.609 0.668 7.0 -4.284 1.0 0.0367 0.0552 0.000000 0.1670 0.8110 80.924 288600.0 4.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
95 1PSBzsahR2AKwLJgx8ehB Bad Things (with Camila Cabello) Machine Gun Kelly 0.675 0.690 2.0 -4.761 1.0 0.1320 0.2100 0.000000 0.2870 0.2720 137.817 239293.0 4.0
96 0QsvXIfqM0zZoerQfsI9l Don't Let Me Down The Chainsmokers 0.542 0.859 11.0 -5.651 1.0 0.1970 0.1600 0.004660 0.1370 0.4030 159.797 208053.0 4.0
97 7mldq42yDuxiUNn08nvzH Body Like A Back Road Sam Hunt 0.731 0.469 5.0 -7.226 1.0 0.0326 0.4630 0.000001 0.1030 0.6310 98.963 165387.0 4.0
98 7i2DJ88J7jQ8K7zqFX2fW Now Or Never Halsey 0.658 0.588 6.0 -4.902 0.0 0.0367 0.1050 0.000001 0.1250 0.4340 110.075 214802.0 4.0
99 1j4kHkkpqZRBwE0A4CN4Y Dusk Till Dawn - Radio Edit ZAYN 0.258 0.437 11.0 -6.593 0.0 0.0390 0.1010 0.000001 0.1060 0.0967 180.043 239000.0 4.0

100 rows × 16 columns

Solution¶

Create an algorithm that computes similarity of songs based on the difference of each song's values in different columns. Machine learning aspect takes in multiple songs that one person likes and thinks are similar and then computes the weightedness for a weighted average of which columns matter more in comparing similarity.

Issues:¶

This dataset is both limited and outdated. The dataset is from 2017 and just has the top songs from that year so it is not accounting for any song that was writted after 2017 or gained popularity past 2017 which at this point was five years ago. In addition, the dataset is also limited to the top 100 songs from that year which is incredibly limited when thinking about the vast expanse of music. This means the generator will only be able to take in and reccomend songs from this top 100 and is therefore missing out on any less popular songs that could be an even better match.

Citations¶

Music Streaming Statistics in 2023 (US & Global Data): https://musicalpursuits.com/music-streaming/.

Further reading on how music reccomendations work https://www.eliftech.com/insights/all-you-need-to-know-about-a-music-recommendation-system-with-a-step-by-step-guide-to-creating-it/#:~:text=Platforms%20that%20sell%20music%20tracks,him%20to%20make%20more%20purchases.

Kaggle Data Set: https://www.kaggle.com/code/jsongunsw/spotify-datasets/data