Every day, thousands of artists are releasing new music. Some of these songs become very popular, and some do not. However, we don't always know why we like certain songs over other ones. Every song has it's own set of musical attributes that make it unique, and these influence people's decision to like or dislike the song. Some attributes may have more of an influence than others to impact a person's liking of the song.
Studies have been conducted on this in the past, including one by Nunes & Ordanini, who "used audio information to show that songs that were number 1 hits on the Billboard Hot 100 Charts in the past 55 years had distinctly different instrumentation than songs that never climbed above the 90th position on these charts" (Rosati 2021).
This project aims to see if we can predict if a song will be a hit or a flop based on the song's attributes, using machine learning.
The data set this project uses contains several csv files, each with a set of songs from a decade, starting from the 1960's and up throught the 2010's. Every song has the following attributes:
Below is a sample of the data, which includes the songs from the 2000's. The dataset also includes similar files for each decade from the 1960's through 2010's.
import pandas as pd
df_2000 = pd.read_csv('dataset-of-00s.csv')
df_2000
track | artist | uri | danceability | energy | key | loudness | mode | speechiness | acousticness | instrumentalness | liveness | valence | tempo | duration_ms | time_signature | chorus_hit | sections | target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Lucky Man | Montgomery Gentry | spotify:track:4GiXBCUF7H6YfNQsnBRIzl | 0.578 | 0.4710 | 4 | -7.270 | 1 | 0.0289 | 0.368000 | 0.000000 | 0.1590 | 0.532 | 133.061 | 196707 | 4 | 30.88059 | 13 | 1 |
1 | On The Hotline | Pretty Ricky | spotify:track:1zyqZONW985Cs4osz9wlsu | 0.704 | 0.8540 | 10 | -5.477 | 0 | 0.1830 | 0.018500 | 0.000000 | 0.1480 | 0.688 | 92.988 | 242587 | 4 | 41.51106 | 10 | 1 |
2 | Clouds Of Dementia | Candlemass | spotify:track:6cHZf7RbxXCKwEkgAZT4mY | 0.162 | 0.8360 | 9 | -3.009 | 1 | 0.0473 | 0.000111 | 0.004570 | 0.1740 | 0.300 | 86.964 | 338893 | 4 | 65.32887 | 13 | 0 |
3 | Heavy Metal, Raise Hell! | Zwartketterij | spotify:track:2IjBPp2vMeX7LggzRN3iSX | 0.188 | 0.9940 | 4 | -3.745 | 1 | 0.1660 | 0.000007 | 0.078400 | 0.1920 | 0.333 | 148.440 | 255667 | 4 | 58.59528 | 9 | 0 |
4 | I Got A Feelin' | Billy Currington | spotify:track:1tF370eYXUcWwkIvaq3IGz | 0.630 | 0.7640 | 2 | -4.353 | 1 | 0.0275 | 0.363000 | 0.000000 | 0.1250 | 0.631 | 112.098 | 193760 | 4 | 22.62384 | 10 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5867 | Summer Rain | Carl Thomas | spotify:track:0NBHHa8wwwmBnn3aAzX5wJ | 0.667 | 0.6270 | 6 | -10.488 | 0 | 0.0654 | 0.097200 | 0.000052 | 0.1110 | 0.784 | 186.081 | 232560 | 4 | 40.87045 | 10 | 1 |
5868 | And I | Ciara | spotify:track:1Jp9n1uHB72CfK31j4mEPh | 0.691 | 0.3890 | 6 | -10.125 | 1 | 0.0653 | 0.255000 | 0.000000 | 0.0981 | 0.437 | 122.219 | 233840 | 4 | 81.77735 | 7 | 1 |
5869 | Mass in B minor BWV 232, Missa: Duetto - Chris... | Johann Sebastian Bach | spotify:track:4NIOi1ImMfdufRTsgoKjbD | 0.297 | 0.0773 | 2 | -23.839 | 1 | 0.0620 | 0.951000 | 0.000217 | 0.1210 | 0.401 | 75.916 | 275560 | 4 | 37.51903 | 11 | 0 |
5870 | Loog | The Clean | spotify:track:2Qyj2nUdm8y37TCCzDasFn | 0.390 | 0.6010 | 7 | -8.236 | 0 | 0.0291 | 0.031300 | 0.947000 | 0.1190 | 0.439 | 116.122 | 223627 | 4 | 39.84092 | 11 | 0 |
5871 | What The World Needs | Wynonna | spotify:track:38Q6YF0TO7E4Dq6K0zdVUk | 0.539 | 0.7400 | 0 | -5.566 | 0 | 0.0490 | 0.194000 | 0.000000 | 0.0760 | 0.675 | 170.054 | 217160 | 4 | 24.95471 | 13 | 1 |
5872 rows × 19 columns
The goal of this project is to determine if a song's musical attributes can help us predict if it will be a hit or a flop. Using the data set, a prediction model can be trained on past music based on its attributes to determine if a song is a hit or a flop. Once a prediction model is trained, the model can be applied to the latest music that’s releasing. We can also use this data set to understand which attributes have more of an impact of making a song become a hit or a flop.
SpotifyUnchained can be used to retrieve data on the latest songs, and then this can be put through Exportify, which will calculate all the features that are involved in our predictor. Finally, this data can be used to predict whether the song is classified as a hit or flop.
https://spotifyunchained.com/ - Retrieves newly released songs on Spotify from each week
https://exportify.net/#playlists - Calculates attributes of the latest songs (Danceability, Energy, Key, Loudness, Mode,Speechiness, Acousticness,Instrumentalness, Liveness, Valence, Tempo, Time Signature)
References
Rosati Dora P., Woolhouse Matthew H., Bolker Benjamin M. and Earn David J. D. 2021 Modelling song popularity as a contagious process. Proc. R. Soc. A. 477: 20210457. 20210457 http://doi.org/10.1098/rspa.2021.0457