A real-world problem that I hope to provide input on is the problem of how to best determine the genre of of any given song. With more genres of popular music than ever, and more influences of different genres on one another, it may be difficult to determine how exactly a song fits into a difficult genre, or if there may need to be construction of a new genre outside of pre-existing definitions. These genre definitions have an impact on musical artists at both small and large scales. At small scales, music genre classification impacts new artists who hope to be noticed online, as algorithms on platforms such as Spotify take into account both artist-determined genres and computer-generated genres when recommending songs that a user may like.[1] At the large scale, awards like the Grammies currently differentiate genres based on committees of executives, but only have a limited number of genres for awards, including pop, rock, and metal. Only relying on these genres may be excluding talented, worthy artists who do not get recognition because their music does not fit in one specific definition of a traditional genre.[2] Together, these issues show that the problem of genre classification is one that must be investigated for the future of the music industry.
# song genre
import pandas as pd
df = pd.read_csv(r'C:\Users\rdela\Code Directory\DS2500\archive\Data\features_30_sec.csv')
df.head()
filename | length | chroma_stft_mean | chroma_stft_var | rms_mean | rms_var | spectral_centroid_mean | spectral_centroid_var | spectral_bandwidth_mean | spectral_bandwidth_var | ... | mfcc16_var | mfcc17_mean | mfcc17_var | mfcc18_mean | mfcc18_var | mfcc19_mean | mfcc19_var | mfcc20_mean | mfcc20_var | label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | blues.00000.wav | 661794 | 0.350088 | 0.088757 | 0.130228 | 0.002827 | 1784.165850 | 129774.064525 | 2002.449060 | 85882.761315 | ... | 52.420910 | -1.690215 | 36.524071 | -0.408979 | 41.597103 | -2.303523 | 55.062923 | 1.221291 | 46.936035 | blues |
1 | blues.00001.wav | 661794 | 0.340914 | 0.094980 | 0.095948 | 0.002373 | 1530.176679 | 375850.073649 | 2039.036516 | 213843.755497 | ... | 55.356403 | -0.731125 | 60.314529 | 0.295073 | 48.120598 | -0.283518 | 51.106190 | 0.531217 | 45.786282 | blues |
2 | blues.00002.wav | 661794 | 0.363637 | 0.085275 | 0.175570 | 0.002746 | 1552.811865 | 156467.643368 | 1747.702312 | 76254.192257 | ... | 40.598766 | -7.729093 | 47.639427 | -1.816407 | 52.382141 | -3.439720 | 46.639660 | -2.231258 | 30.573025 | blues |
3 | blues.00003.wav | 661794 | 0.404785 | 0.093999 | 0.141093 | 0.006346 | 1070.106615 | 184355.942417 | 1596.412872 | 166441.494769 | ... | 44.427753 | -3.319597 | 50.206673 | 0.636965 | 37.319130 | -0.619121 | 37.259739 | -3.407448 | 31.949339 | blues |
4 | blues.00004.wav | 661794 | 0.308526 | 0.087841 | 0.091529 | 0.002303 | 1835.004266 | 343399.939274 | 1748.172116 | 88445.209036 | ... | 86.099236 | -5.454034 | 75.269707 | -0.916874 | 53.613918 | -4.404827 | 62.910812 | -11.703234 | 55.195160 | blues |
5 rows × 60 columns
Data descriptions sourced from Andrade Olteanu's explainer. This data represents all different aspects of an audio file and can show how different clips of audio correspond to different genres.
Label | Explanation |
---|---|
filename | name of .wav file in dataset |
length | length of audio in sequence of vibrations |
chroma_stft_mean | mean of the short-time Fourier transformation (frequencies as a function of time) |
chroma_stft_var | variance of the short-time Fourier transformation (frequencies as a function of time) |
rms_mean | mean of the Mel Spectogram (spectrum of frequencies) |
rms_var | variance of the Mel Spectogram (spectrum of frequencies) |
spectral_centroid_mean | mean of weighted mean of frequencies present in sound |
spectral_centroid_var | variance of weighted mean of frequencies present in sound |
spectral_bandwidth_mean | mean of total spectral energy |
spectral_bandwidth_var | variance of total spectral energy |
rolloff_mean | mean of specified percentage of total spectral energy |
rolloff_var | variance of specified percentage of total spectral energy |
zero_crossing_rate_mean | mean of the rate at which signal changes from positive to negative |
zero_crossing_rate_var | variance of the rate at which signal changes from positive to negative |
harmony_mean | mean of sound color in harmony |
harmony_var | variance of sound color in harmony |
perceptr_mean | mean of sound rhythm and emotion |
perceptr_var | variance of sound rhythm and emotion |
tempo | beats per minute of audio |
mfcc[1-20]_mean | mean of small set of features which describe overall shape of spectral envelope |
mfcc[1-20]_var | variance small set of features which describe overall shape of spectral envelope |
label | determined genre |
The machine learning method I hope to use is k-nearest neighbors clustering to see if these variables correspond to the labeled genre, or if there are new/unexpected genres that emerge.