Feature Significance in Speech Emotion Recognition

Atul Mishra
Sarthak Jindal

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In the field of speech emotion recognition, the choice of audio features can dramatically influence the accuracy and effectiveness of classification systems. This study presents a comprehensive comparative analysis of feature significance, shedding light on how different audio characteristics contribute to the success of emotion recognition methodologies. This paper analyzes speech-based emotion recognition techniques and works with audio analyses using the Ryerson Audio-Visual Database (RAVD) of Emotional Speech and Song, a database consisting of audio analysis on raw audio files. Analysis involved features like Log-Mel Spectrograms (LMS), Mel-Frequency Cepstral Coefficients (MFCCs), pitch, and energy, after raw audio files are pre-processed. We measured the relevance of these features to the classification of emotion through a series of approaches that include Long Short-Term Memory networks, Convolutional Neural Networks, Hidden Markov Models, and Deep Neural Networks. On a 14-class classification problem that covers two genders and seven emotions, we obtained 56% accuracy by using a 4-layer 2-dimensional CNN with Log-Mel Spectrogram features. Our results show the importance of selection of good audio features while complexity is not that important for performance for emotion recognition.

Version published to 10.21203/rs.3.rs-7474053/v1 on Research Square
Aug 28, 2025

Learning Emotional Nuances in Speech via DCNNs and Spectral Feature Integration

This article has 5 authors:
1. K. Venkatesh Sharma
2. Pramod Reddy
3. Rakesh Betala
4. Madhavi Pappula
5. Shirisha Reddy K
This article has no evaluationsLatest version Sep 2, 2025
Evaluating neural encoding of prosody in emotional speech using the speech FFR in normal-hearing adults

This article has 5 authors:
1. Maryam Karimi-Boroujeni
2. Sajad Sadeghkhani
3. Christian Giguère
4. Saeid R. Seydnejad
5. Hilmi R. Dajani
This article has no evaluationsLatest version Sep 22, 2025
Interpretable Multimodal Emotion Recognition in Counseling Dialogues via Factor Analysis and Gaussian Mixture Modeling

This article has 10 authors:
1. Keita Kiuchi
2. Kotaro Kashihara
3. Toshiki Takanabe
4. Hidehiro Umehara
5. Koushi Irizawa
6. Masahito Nakataki
7. Shunsuke Numata
8. Xin Kang
9. Minoru Yoshida
10. Kazuyuki Matsumoto
This article has no evaluationsLatest version Aug 24, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Learning Emotional Nuances in Speech via DCNNs and Spectral Feature Integration

Evaluating neural encoding of prosody in emotional speech using the speech FFR in normal-hearing adults

Interpretable Multimodal Emotion Recognition in Counseling Dialogues via Factor Analysis and Gaussian Mixture Modeling