Since the pandemic started in 2020, the music streaming population has increased by 26.4% to 523.9 million people. Evidently, music is important to a large group of people, but it is often hard to finda new song to listen to or other songs that are similar to ones you already know you like.
This generator will take in one of your favorite songs and will produce a new song for you to listen to that has a similar feel to it.
I am using the Kaggle Spotify DataSets which includes the top 100 songs from 2017 and shows these aspects of each song:
data_dict = {'id': 'a unique id only relevant to spotify',
'name': 'name of the song',
'artists': 'artist or artists singing the song',
'danceability': 'subjective measure of how danceable the song is (0 through 1)',
'energy': 'subjective measure of how much energy the song gives you (0 thorugh 1)',
'key': 'numeric int equivalent of the key',
'loudness': 'how loud the song is (0 being least loud and the more negative the louder)',
'mode': 'either zero or one',
'speechiness': 'subjective measure of how much wordiness the song is (0 thorugh 1)',
'acousticness': 'subjective meassure of how acoustic the song is (0 through 1)',
'instrumentalness': 'subjective meassure of how instrumental the song is (0 through 1)',
'liveness': 'subjective meassure of how lively the song is (0 through 1)',
'valence': 'subjective meassure of how valence the song has (0 through 1)',
'tempo': 'beats per minute of the song',
'duration_ms':' duration of the song in miliseconds',
'time_signature': 'time signature of the song (4.0 = 4/4)'}
data_dict
{'id': 'a unique id only relevant to spotify', 'name': 'name of the song', 'artists': 'artist or artists singing the song', 'danceability': 'subjective measure of how danceable the song is (0 through 1)', 'energy': 'subjective measure of how much energy the song gives you (0 thorugh 1)', 'key': 'numeric int equivalent of the key', 'loudness': 'how loud the song is (0 being least loud and the more negative the louder)', 'mode': 'either zero or one', 'speechiness': 'subjective measure of how much wordiness the song is (0 thorugh 1)', 'acousticness': 'subjective meassure of how acoustic the song is (0 through 1)', 'instrumentalness': 'subjective meassure of how instrumental the song is (0 through 1)', 'liveness': 'subjective meassure of how lively the song is (0 through 1)', 'valence': 'subjective meassure of how valence the song has (0 through 1)', 'tempo': 'beats per minute of the song', 'duration_ms': ' duration of the song in miliseconds', 'time_signature': 'time signature of the song (4.0 = 4/4)'}
import pandas as pd
df_music = pd.read_csv("spotify2017.csv")
df_music
id | name | artists | danceability | energy | key | loudness | mode | speechiness | acousticness | instrumentalness | liveness | valence | tempo | duration_ms | time_signature | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7qiZfU4dY1lWllzX7mPBI | Shape of You | Ed Sheeran | 0.825 | 0.652 | 1.0 | -3.183 | 0.0 | 0.0802 | 0.5810 | 0.000000 | 0.0931 | 0.9310 | 95.977 | 233713.0 | 4.0 |
1 | 5CtI0qwDJkDQGwXD1H1cL | Despacito - Remix | Luis Fonsi | 0.694 | 0.815 | 2.0 | -4.328 | 1.0 | 0.1200 | 0.2290 | 0.000000 | 0.0924 | 0.8130 | 88.931 | 228827.0 | 4.0 |
2 | 4aWmUDTfIPGksMNLV2rQP | Despacito (Featuring Daddy Yankee) | Luis Fonsi | 0.660 | 0.786 | 2.0 | -4.757 | 1.0 | 0.1700 | 0.2090 | 0.000000 | 0.1120 | 0.8460 | 177.833 | 228200.0 | 4.0 |
3 | 6RUKPb4LETWmmr3iAEQkt | Something Just Like This | The Chainsmokers | 0.617 | 0.635 | 11.0 | -6.769 | 0.0 | 0.0317 | 0.0498 | 0.000014 | 0.1640 | 0.4460 | 103.019 | 247160.0 | 4.0 |
4 | 3DXncPQOG4VBw3QHh3S81 | I'm the One | DJ Khaled | 0.609 | 0.668 | 7.0 | -4.284 | 1.0 | 0.0367 | 0.0552 | 0.000000 | 0.1670 | 0.8110 | 80.924 | 288600.0 | 4.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
95 | 1PSBzsahR2AKwLJgx8ehB | Bad Things (with Camila Cabello) | Machine Gun Kelly | 0.675 | 0.690 | 2.0 | -4.761 | 1.0 | 0.1320 | 0.2100 | 0.000000 | 0.2870 | 0.2720 | 137.817 | 239293.0 | 4.0 |
96 | 0QsvXIfqM0zZoerQfsI9l | Don't Let Me Down | The Chainsmokers | 0.542 | 0.859 | 11.0 | -5.651 | 1.0 | 0.1970 | 0.1600 | 0.004660 | 0.1370 | 0.4030 | 159.797 | 208053.0 | 4.0 |
97 | 7mldq42yDuxiUNn08nvzH | Body Like A Back Road | Sam Hunt | 0.731 | 0.469 | 5.0 | -7.226 | 1.0 | 0.0326 | 0.4630 | 0.000001 | 0.1030 | 0.6310 | 98.963 | 165387.0 | 4.0 |
98 | 7i2DJ88J7jQ8K7zqFX2fW | Now Or Never | Halsey | 0.658 | 0.588 | 6.0 | -4.902 | 0.0 | 0.0367 | 0.1050 | 0.000001 | 0.1250 | 0.4340 | 110.075 | 214802.0 | 4.0 |
99 | 1j4kHkkpqZRBwE0A4CN4Y | Dusk Till Dawn - Radio Edit | ZAYN | 0.258 | 0.437 | 11.0 | -6.593 | 0.0 | 0.0390 | 0.1010 | 0.000001 | 0.1060 | 0.0967 | 180.043 | 239000.0 | 4.0 |
100 rows × 16 columns
Create an algorithm that computes similarity of songs based on the difference of each song's values in different columns. Machine learning aspect takes in multiple songs that one person likes and thinks are similar and then computes the weightedness for a weighted average of which columns matter more in comparing similarity.
This dataset is both limited and outdated. The dataset is from 2017 and just has the top songs from that year so it is not accounting for any song that was writted after 2017 or gained popularity past 2017 which at this point was five years ago. In addition, the dataset is also limited to the top 100 songs from that year which is incredibly limited when thinking about the vast expanse of music. This means the generator will only be able to take in and reccomend songs from this top 100 and is therefore missing out on any less popular songs that could be an even better match.
Music Streaming Statistics in 2023 (US & Global Data): https://musicalpursuits.com/music-streaming/.
Further reading on how music reccomendations work https://www.eliftech.com/insights/all-you-need-to-know-about-a-music-recommendation-system-with-a-step-by-step-guide-to-creating-it/#:~:text=Platforms%20that%20sell%20music%20tracks,him%20to%20make%20more%20purchases.
Kaggle Data Set: https://www.kaggle.com/code/jsongunsw/spotify-datasets/data