Interpretable Framework for Continuous Music Emotion Recognition Based on Feature Selection and Machine Learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Music emotion recognition aims to understand how acoustic features influence music emotions, which is also widely used in music composition, therapeutic interventions and personalized recommendation systems. Currently, the application of deep learning methods has significantly improved the accuracy of emotion recognition. However, research generally faces the black-box problem, the prediction process cannot be explained. In this study, we utilize three feature selection methods combined with two machine learning models to present an interpretable framework for continuous emotion prediction. This framework is based on the Valence-Arousal model, which is currently the most widely used in emotion recognition. The SHAP method is the key to this interpretable framework, as it details the specific contributions of each musical feature in the optimal model to the dimensions of musical emotion. This study shows that spectral entropy and the first Mel-frequency cepstral coefficient play a dominant role in music emotion prediction, while musical features such as rhythm and pitch also make significant contributions to specific dimensions of musical emotion. These findings demonstrate the acoustic determinants of musical emotions and highlight the potential of explainable machine learning in affective computing.

Article activity feed