Hit or Flop Song?¶

Project Proposal¶

DS 2500¶

Every day, thousands of artists are releasing new music. Some of these songs become very popular, and some do not. However, we don't always know why we like certain songs over other ones. Every song has it's own set of musical attributes that make it unique, and these influence people's decision to like or dislike the song. Some attributes may have more of an influence than others to impact a person's liking of the song.

Studies have been conducted on this in the past, including one by Nunes & Ordanini, who "used audio information to show that songs that were number 1 hits on the Billboard Hot 100 Charts in the past 55 years had distinctly different instrumentation than songs that never climbed above the 90th position on these charts" (Rosati 2021).

This project aims to see if we can predict if a song will be a hit or a flop based on the song's attributes, using machine learning.

https://www.kaggle.com/datasets/theoverman/the-spotify-hit-predictor-dataset?select=dataset-of-10s.csv

The data set this project uses contains several csv files, each with a set of songs from a decade, starting from the 1960's and up throught the 2010's. Every song has the following attributes:

  • Track
  • Artist
  • URI
  • Danceability: 0 - 1; least to most danceable
  • Energy: 0 - 1; lowest to highest energy
  • Key: ints correspond to pitches (0 is C, etc.)
  • Loudness: decibels
  • Mode: minor is 0, major is 1 (melodic scales)
  • Speechiness: 0 - 1; presence of spoken words - more speech is closer to 1
  • Acousticness: 0 - 1; 1 is high confidence that track is acoustic
  • Instrumentalness: 0 - 1: tracks closer to 1 contain less vocal content
  • Liveness: presence of audience in recording; >0.8 = strong likelihood
  • Valence: 0 - 1; least to most positive
  • Tempo: beats per minute
  • Duration: milliseconds
  • Time signature: notation for how many beats are in each measure
  • Chorus hit: timestamp of where chorus starts (author's estimate)
  • Sections: number of sections in the track
  • Target (hit or flop)
    • 1 is a hit: song has featured in the weekly list (Issued by Billboards) of Hot-100 tracks in that decade at least once
    • 0 is a flop

Below is a sample of the data, which includes the songs from the 2000's. The dataset also includes similar files for each decade from the 1960's through 2010's.

In [1]:
import pandas as pd

df_2000 = pd.read_csv('dataset-of-00s.csv')
df_2000
Out[1]:
track artist uri danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo duration_ms time_signature chorus_hit sections target
0 Lucky Man Montgomery Gentry spotify:track:4GiXBCUF7H6YfNQsnBRIzl 0.578 0.4710 4 -7.270 1 0.0289 0.368000 0.000000 0.1590 0.532 133.061 196707 4 30.88059 13 1
1 On The Hotline Pretty Ricky spotify:track:1zyqZONW985Cs4osz9wlsu 0.704 0.8540 10 -5.477 0 0.1830 0.018500 0.000000 0.1480 0.688 92.988 242587 4 41.51106 10 1
2 Clouds Of Dementia Candlemass spotify:track:6cHZf7RbxXCKwEkgAZT4mY 0.162 0.8360 9 -3.009 1 0.0473 0.000111 0.004570 0.1740 0.300 86.964 338893 4 65.32887 13 0
3 Heavy Metal, Raise Hell! Zwartketterij spotify:track:2IjBPp2vMeX7LggzRN3iSX 0.188 0.9940 4 -3.745 1 0.1660 0.000007 0.078400 0.1920 0.333 148.440 255667 4 58.59528 9 0
4 I Got A Feelin' Billy Currington spotify:track:1tF370eYXUcWwkIvaq3IGz 0.630 0.7640 2 -4.353 1 0.0275 0.363000 0.000000 0.1250 0.631 112.098 193760 4 22.62384 10 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5867 Summer Rain Carl Thomas spotify:track:0NBHHa8wwwmBnn3aAzX5wJ 0.667 0.6270 6 -10.488 0 0.0654 0.097200 0.000052 0.1110 0.784 186.081 232560 4 40.87045 10 1
5868 And I Ciara spotify:track:1Jp9n1uHB72CfK31j4mEPh 0.691 0.3890 6 -10.125 1 0.0653 0.255000 0.000000 0.0981 0.437 122.219 233840 4 81.77735 7 1
5869 Mass in B minor BWV 232, Missa: Duetto - Chris... Johann Sebastian Bach spotify:track:4NIOi1ImMfdufRTsgoKjbD 0.297 0.0773 2 -23.839 1 0.0620 0.951000 0.000217 0.1210 0.401 75.916 275560 4 37.51903 11 0
5870 Loog The Clean spotify:track:2Qyj2nUdm8y37TCCzDasFn 0.390 0.6010 7 -8.236 0 0.0291 0.031300 0.947000 0.1190 0.439 116.122 223627 4 39.84092 11 0
5871 What The World Needs Wynonna spotify:track:38Q6YF0TO7E4Dq6K0zdVUk 0.539 0.7400 0 -5.566 0 0.0490 0.194000 0.000000 0.0760 0.675 170.054 217160 4 24.95471 13 1

5872 rows × 19 columns

The goal of this project is to determine if a song's musical attributes can help us predict if it will be a hit or a flop. Using the data set, a prediction model can be trained on past music based on its attributes to determine if a song is a hit or a flop. Once a prediction model is trained, the model can be applied to the latest music that’s releasing. We can also use this data set to understand which attributes have more of an impact of making a song become a hit or a flop.

SpotifyUnchained can be used to retrieve data on the latest songs, and then this can be put through Exportify, which will calculate all the features that are involved in our predictor. Finally, this data can be used to predict whether the song is classified as a hit or flop.

https://spotifyunchained.com/ - Retrieves newly released songs on Spotify from each week

https://exportify.net/#playlists - Calculates attributes of the latest songs (Danceability, Energy, Key, Loudness, Mode,Speechiness, Acousticness,Instrumentalness, Liveness, Valence, Tempo, Time Signature)

References

Rosati Dora P., Woolhouse Matthew H., Bolker Benjamin M. and Earn David J. D. 2021 Modelling song popularity as a contagious process. Proc. R. Soc. A. 477: 20210457. 20210457 http://doi.org/10.1098/rspa.2021.0457

In [ ]: