Speech Emotion Recognition Using Multi-Scale Global–Local Representation Learning with Feature Pyramid Network

Yuhua Wang
Jianxing Huang
Zhengdao Zhao
Haiyan Lan
Xinjia Zhang

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Speech emotion recognition (SER) is important in facilitating natural human–computer interactions. In speech sequence modeling, a vital challenge is to learn context-aware sentence expression and temporal dynamics of paralinguistic features to achieve unambiguous emotional semantic understanding. In previous studies, the SER method based on the single-scale cascade feature extraction module could not effectively preserve the temporal structure of speech signals in the deep layer, downgrading the sequence modeling performance. To address these challenges, this paper proposes a novel multi-scale feature pyramid network. The enhanced multi-scale convolutional neural networks (MSCNNs) significantly improve the ability to extract multi-granular emotional features. Experimental results on the IEMOCAP corpus demonstrate the effectiveness of the proposed approach, achieving a weighted accuracy (WA) of 71.79% and an unweighted accuracy (UA) of 73.39%. Furthermore, on the RAVDESS dataset, the model achieves an unweighted accuracy (UA) of 86.5%. These results validate the system’s performance and highlight its competitive advantage.

Version published to 10.3390/app142411494
Dec 10, 2024
Version published to 10.20944/preprints202410.1002.v1
Oct 14, 2024

Facial Emotion Recognition Based on ResNet18 with Multi-Dimensional Attention Mechanisms

This article has 4 authors:
1. 阳西
2. 陈雪吴
3. 天宇孟
4. 昆珍李
This article has no evaluationsLatest version May 8, 2025
Real-Time Emotion Recognition with CNN and LSTM

This article has 1 author:
1. Sanmay Kotkar
This article has no evaluationsLatest version May 9, 2025
EmotionSense: A Deep Learning-Based Text Emotion Classifier Using NLP for Real-Time Analysis

This article has 2 authors:
1. N Prem sankar
2. Pranav Sanjay Manapure
This article has no evaluationsLatest version May 15, 2025

Listed in

Abstract

Article activity feed

Related articles

Facial Emotion Recognition Based on ResNet18 with Multi-Dimensional Attention Mechanisms

Real-Time Emotion Recognition with CNN and LSTM

EmotionSense: A Deep Learning-Based Text Emotion Classifier Using NLP for Real-Time Analysis