Interpretable Framework for Continuous Music Emotion Recognition Based on Feature Selection and Machine Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Music emotion recognition aims to understand how acoustic features influence music emotions, which is also widely used in music composition, therapeutic interventions and personalized recommendation systems. Currently, the application of deep learning methods has significantly improved the accuracy of emotion recognition. However, research generally faces the black-box problem, the prediction process cannot be explained. In this study, we utilize three feature selection methods combined with two machine learning models to present an interpretable framework for continuous emotion prediction. This framework is based on the Valence-Arousal model, which is currently the most widely used in emotion recognition. The SHAP method is the key to this interpretable framework, as it details the specific contributions of each musical feature in the optimal model to the dimensions of musical emotion. This study shows that spectral entropy and the first Mel-frequency cepstral coefficient play a dominant role in music emotion prediction, while musical features such as rhythm and pitch also make significant contributions to specific dimensions of musical emotion. These findings demonstrate the acoustic determinants of musical emotions and highlight the potential of explainable machine learning in affective computing.