{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "NAIEGNe9X3IE" }, "source": [ "# Song Popularity Prediction" ] }, { "cell_type": "markdown", "metadata": { "id": "UMPUeVHAX2VL" }, "source": [ "# Motivation:\n", "## Problem\n", "Determining whether a song is popular or not, before it is widely shared, is difficult. Alternatively, one can solicit feedback from expert listeners but these experts often disagree and may not be strong indicators of a song's future popularity\n", "\n", "## Solution\n", "Spotify is the music streaming service that has become one of the most popular platforms in the world. The scope of their platform allows for a rich dataset of music preferences, either in the explicit \"liking\" of a song or the frequency with which users play it. **The goal of this project is to identify and use a relationship between spotify's features (e.g. danceability, audio valence) and the popularity of a song**. \n", "\n", "## Impact\n", "If successful, this work may yield a classifier which predicts how popular a song is based on these features. Such a predictor could be helpful to gauge interest in a song before its release to audiences, helping record companies identify promising up-and-coming artists without play-testing their music over the radio. \n", "\n", "One (very) negative outcome of such a classifier is that it may encourage the prioritization of popular music over niche songs which aren't as commonly appreciated." ] }, { "cell_type": "markdown", "metadata": { "id": "FOZdwTPSYMDC" }, "source": [ "# Dataset\n", "## Detail\n", "We will use a [Kaggle Dataset of Spotify Music](https://www.kaggle.com/code/aeryan/spotify-music-analysis/data) to observe the following features for each song:\n", "\n", "- song name\n", "\n", "- song_duration\n", "- acousticness\n", "- danceability \n", "- energy\n", "- instrumental\n", "- key\n", "- liveness\n", "- loudness\n", "- audio_mode\n", "- speechiness\n", "- tempo\n", "- time_signature\n", "- audio_valence\n", "\n", "| song_name | song_popularity | song_duration_ms | acousticness | danceability | energy | instrumentalness | key | liveness | loudness | audio_mode | speechiness | tempo | time_signature | audio_valence | |\n", "|----------:|---------------------------:|-----------------:|-------------:|-------------:|-------:|-----------------:|---------:|---------:|---------:|-----------:|------------:|-------:|---------------:|--------------:|-------|\n", "| 0 | Boulevard of Broken Dreams | 73 | 262333 | 0.005520 | 0.496 | 0.682 | 0.000029 | 8 | 0.0589 | -4.095 | 1 | 0.0294 | 167.060 | 4 | 0.474 |\n", "| 1 | In The End | 66 | 216933 | 0.010300 | 0.542 | 0.853 | 0.000000 | 3 | 0.1080 | -6.407 | 0 | 0.0498 | 105.256 | 4 | 0.370 |\n", "| 2 | Seven Nation Army | 76 | 231733 | 0.008170 | 0.737 | 0.463 | 0.447000 | 0 | 0.2550 | -7.828 | 1 | 0.0792 | 123.881 | 4 | 0.324 |\n", "| 3 | By The Way | 74 | 216933 | 0.026400 | 0.451 | 0.970 | 0.003550 | 0 | 0.1020 | -4.938 | 1 | 0.1070 | 122.444 | 4 | 0.198 |\n", "| 4 | How You Remind Me | 56 | 223826 | 0.000954 | 0.447 | 0.766 | 0.000000 | 10 | 0.1130 | -5.065 | 1 | 0.0313 | 172.011 | 4 | 0.574 |\n", "\n", "In addition to these, each song has a relative popularity ranking (i.e. 1 is the most popular song, 10 is the 10th most popular ...). **Our project seeks to use the features above to estimate the popularity of a song.**\n", "\n", "## Potential Problems\n", "As observed, we are currently unclear as to how each of the features above is defined. Our working assumption is these features are automatically computed and would be available to some future song. \n", "\n", "Of course, measuring some implicit `danceability` of a song is subjective but we are concerned the the scores assigned by Spotify do not meaningfully encapsulate their namesakes. This could be problematic in the interpretation of the outcome. Can we trust that songs with a higher `danceability` score do indeed encourage people to boogie down? To mitigate this we'll listen to a few songs and ensure that these features pass a quick sanity listen." ] }, { "cell_type": "markdown", "metadata": { "id": "WDzJXtDwZ3M0" }, "source": [ "## Method:\n", "We pose our problem as a regression (line of best fit) problem: given the listed features above (except 'name') we seek to estimate the popularity of each song. One advantage of this approach is that it offers an intuitive output as each feature will explicitly be associated with some increase or decrease in popularity." ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "DS3000_sample_ProectSketch.ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 1 }