Feature Significance in Speech Emotion Recognition
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In the field of speech emotion recognition, the choice of audio features can dramatically influence the accuracy and effectiveness of classification systems. This study presents a comprehensive comparative analysis of feature significance, shedding light on how different audio characteristics contribute to the success of emotion recognition methodologies. This paper analyzes speech-based emotion recognition techniques and works with audio analyses using the Ryerson Audio-Visual Database (RAVD) of Emotional Speech and Song, a database consisting of audio analysis on raw audio files. Analysis involved features like Log-Mel Spectrograms (LMS), Mel-Frequency Cepstral Coefficients (MFCCs), pitch, and energy, after raw audio files are pre-processed. We measured the relevance of these features to the classification of emotion through a series of approaches that include Long Short-Term Memory networks, Convolutional Neural Networks, Hidden Markov Models, and Deep Neural Networks. On a 14-class classification problem that covers two genders and seven emotions, we obtained 56% accuracy by using a 4-layer 2-dimensional CNN with Log-Mel Spectrogram features. Our results show the importance of selection of good audio features while complexity is not that important for performance for emotion recognition.