Exploring Music Representation Learning for Detection of Finer-grained Details
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The art of music is profound and includes the fine-tuning of multiple fine-grained aspects. Taking the example of one such attribute, the music tonic, this paper investigates if machine learning can be effectively applied to detect fine-grained features from various representations of music. It describes the methodology and experiments to determine the effectiveness of the audio representations and non-linear machine learning classifiers such as Support Vector Machines, K-Nearest Neighbors, and tree-based methods in capturing and detecting tonic informa-tion. The effectiveness of Mel-frequency cepstral coefficients is compared with Bark-frequency cepstral coefficients and Mel spectrogram data, which are other audio representations. The feature extraction of the Mel-frequency cepstral coef-ficients is performed using two different tools for comparison. A spectral analysis of the music renditions in a chosen dataset using Principal Component Analysis, and other dimensionality reduction techniques, t-distributed stochastic neighbor embedding, auto-encoder, and Uniform Manifold Approximation and Projection provided valuable insights into the representation learning of music in general. Classification using fine-tuned machine learning models demonstrated the fea-sibility of automating the detection and possible prediction of the music tonic. The work also resulted in a dataset with vocal renditions of the second and third authors of this paper that was used for some of the experiments.