Exploring Music Representation Learning for Detection of Finer-grained Details

Vishnu S Pendyala
Samhita Konduri
Kriti Pendyala

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The art of music is profound and includes the fine-tuning of multiple fine-grained aspects. Taking the example of one such attribute, the music tonic, this paper investigates if machine learning can be effectively applied to detect fine-grained features from various representations of music. It describes the methodology and experiments to determine the effectiveness of the audio representations and non-linear machine learning classifiers such as Support Vector Machines, K-Nearest Neighbors, and tree-based methods in capturing and detecting tonic informa-tion. The effectiveness of Mel-frequency cepstral coefficients is compared with Bark-frequency cepstral coefficients and Mel spectrogram data, which are other audio representations. The feature extraction of the Mel-frequency cepstral coef-ficients is performed using two different tools for comparison. A spectral analysis of the music renditions in a chosen dataset using Principal Component Analysis, and other dimensionality reduction techniques, t-distributed stochastic neighbor embedding, auto-encoder, and Uniform Manifold Approximation and Projection provided valuable insights into the representation learning of music in general. Classification using fine-tuned machine learning models demonstrated the fea-sibility of automating the detection and possible prediction of the music tonic. The work also resulted in a dataset with vocal renditions of the second and third authors of this paper that was used for some of the experiments.

Version published to 10.21203/rs.3.rs-8050788/v1 on Research Square
Nov 10, 2025

Deepfake Audio Detection Using Machine Learning and Deep Learning Methods

This article has 1 author:
1. Mainul Islam
This article has no evaluationsLatest version Jan 6, 2026
Real-Time Mobile Music Note and Instrument Recognition: A Unified Deep Learning vs. Classical ML Benchmark on MusicNet and NSynth

This article has 3 authors:
1. Tarek Ammar
2. Aya Alaya
3. Tarek Barhoum
This article has no evaluationsLatest version Jan 7, 2026
Environmental Sound Classification Using Feature Fusion of MFCCs, Mel-spectrogram, and Chroma

This article has 1 author:
1. Mainul Islam
This article has no evaluationsLatest version Jan 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Deepfake Audio Detection Using Machine Learning and Deep Learning Methods

Real-Time Mobile Music Note and Instrument Recognition: A Unified Deep Learning vs. Classical ML Benchmark on MusicNet and NSynth

Environmental Sound Classification Using Feature Fusion of MFCCs, Mel-spectrogram, and Chroma