A real-world problem that I hope to provide input on is the problem of how to best determine the genre of of any given song. With more genres of popular music than ever, and more influences of different genres on one another, it may be difficult to determine how exactly a song fits into a difficult genre, or if there may need to be construction of a new genre outside of pre-existing definitions. These genre definitions have an impact on musical artists at both small and large scales. At small scales, music genre classification impacts new artists who hope to be noticed online, as algorithms on platforms such as Spotify take into account both artist-determined genres and computer-generated genres when recommending songs that a user may like.[1] At the large scale, awards like the Grammies currently differentiate genres based on committees of executives, but only have a limited number of genres for awards, including pop, rock, and metal. Only relying on these genres may be excluding talented, worthy artists who do not get recognition because their music does not fit in one specific definition of a traditional genre.[2] Together, these issues show that the problem of genre classification is one that must be investigated for the future of the music industry.

In [1]:
# song genre
In [4]:
import pandas as pd
df = pd.read_csv(r'C:\Users\rdela\Code Directory\DS2500\archive\Data\features_30_sec.csv')
df.head()
Out[4]:
filename length chroma_stft_mean chroma_stft_var rms_mean rms_var spectral_centroid_mean spectral_centroid_var spectral_bandwidth_mean spectral_bandwidth_var ... mfcc16_var mfcc17_mean mfcc17_var mfcc18_mean mfcc18_var mfcc19_mean mfcc19_var mfcc20_mean mfcc20_var label
0 blues.00000.wav 661794 0.350088 0.088757 0.130228 0.002827 1784.165850 129774.064525 2002.449060 85882.761315 ... 52.420910 -1.690215 36.524071 -0.408979 41.597103 -2.303523 55.062923 1.221291 46.936035 blues
1 blues.00001.wav 661794 0.340914 0.094980 0.095948 0.002373 1530.176679 375850.073649 2039.036516 213843.755497 ... 55.356403 -0.731125 60.314529 0.295073 48.120598 -0.283518 51.106190 0.531217 45.786282 blues
2 blues.00002.wav 661794 0.363637 0.085275 0.175570 0.002746 1552.811865 156467.643368 1747.702312 76254.192257 ... 40.598766 -7.729093 47.639427 -1.816407 52.382141 -3.439720 46.639660 -2.231258 30.573025 blues
3 blues.00003.wav 661794 0.404785 0.093999 0.141093 0.006346 1070.106615 184355.942417 1596.412872 166441.494769 ... 44.427753 -3.319597 50.206673 0.636965 37.319130 -0.619121 37.259739 -3.407448 31.949339 blues
4 blues.00004.wav 661794 0.308526 0.087841 0.091529 0.002303 1835.004266 343399.939274 1748.172116 88445.209036 ... 86.099236 -5.454034 75.269707 -0.916874 53.613918 -4.404827 62.910812 -11.703234 55.195160 blues

5 rows × 60 columns

Data descriptions sourced from Andrade Olteanu's explainer. This data represents all different aspects of an audio file and can show how different clips of audio correspond to different genres.

Label Explanation
filename name of .wav file in dataset
length length of audio in sequence of vibrations
chroma_stft_mean mean of the short-time Fourier transformation (frequencies as a function of time)
chroma_stft_var variance of the short-time Fourier transformation (frequencies as a function of time)
rms_mean mean of the Mel Spectogram (spectrum of frequencies)
rms_var variance of the Mel Spectogram (spectrum of frequencies)
spectral_centroid_mean mean of weighted mean of frequencies present in sound
spectral_centroid_var variance of weighted mean of frequencies present in sound
spectral_bandwidth_mean mean of total spectral energy
spectral_bandwidth_var variance of total spectral energy
rolloff_mean mean of specified percentage of total spectral energy
rolloff_var variance of specified percentage of total spectral energy
zero_crossing_rate_mean mean of the rate at which signal changes from positive to negative
zero_crossing_rate_var variance of the rate at which signal changes from positive to negative
harmony_mean mean of sound color in harmony
harmony_var variance of sound color in harmony
perceptr_mean mean of sound rhythm and emotion
perceptr_var variance of sound rhythm and emotion
tempo beats per minute of audio
mfcc[1-20]_mean mean of small set of features which describe overall shape of spectral envelope
mfcc[1-20]_var variance small set of features which describe overall shape of spectral envelope
label determined genre

The machine learning method I hope to use is k-nearest neighbors clustering to see if these variables correspond to the labeled genre, or if there are new/unexpected genres that emerge.