A Comprehensive Review in Unimodal and Multimodal Emotion Recognition

Jiachen Luo
Qu Yang
Jiajun He
Yining Hua
Zheng Lian
Yuanchao Li
Siyang Song
Wen Wu
Dingdong Wang
Shuai Shen
Jingyao Wu
Guimin Hu
He Hu
Yong Li
Zixing Zhang
Jiadong Wang
Sifan Zhou
Zuojin Tang
Canran Xiao
Sheng Xu
Zhenjun Zhao
Xiangyang Xue
Sicheng Zhao
Yong Dai
Tomoki Toda
Licai Sun
Kailai Yang
Liyun Zhang
Cong Cai
Jiamin Du
Ziyang Ma
Mingjie Chen
Chengxuan Qian
Zhenlong Yuan
Xie Chen
Huy Phan
Lin Wang
Björn Schuller
Joshua Reiss

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Emotion recognition is a fundamental component of human-centered intelligent systems, supporting applications in healthcare, education, marketing, and human–computer interaction. Despite rapid progress driven by deep learning across facial, speech, textual, and multi-modal settings, the literature remains difficult to compare due to inconsistent emotion models, heterogeneous datasets, and varying evaluation protocols. This survey addresses this gap by providing a unified synthesis of deep learning-based uni-modal and multi-modal emotion recognition within a coherent analytical framework covering emotion modeling, dataset curation, representation learning, fusion strategies, and evaluation. Rather than listing methods, we organize existing work around key structural choices and trade-offs that affect generalization. For uni-modal approaches, we analyze how facial, speech, and textual methods increasingly rely on self-supervised pretraining to mitigate annotation scarcity, while retaining modality-specific limitations. For multi-modal systems, we examine alignment, modality dominance, complementarity, robustness, and the emerging role of large language models in affective reasoning. We further highlight persistent challenges, including label ambiguity, cross-dataset generalization, fairness, and the gap between benchmark performance and real-world deployment. This survey provides a unified perspective and a roadmap for future research. Resources are available at https://github.com/jackchen69/Awesome-Emotion-Models.

Version published to 10.31234/osf.io/pny2b_v1 on OSF Preprints
Mar 30, 2026

Evaluating Early, Late and Hybrid Fusion in Multimodal Emotion Detection with Pretrained Models

This article has 3 authors:
1. Syed Riyas Ahamed
2. Sandip Saha
3. Awani Bhushan
This article has no evaluationsLatest version Apr 13, 2026
Artificial emotional introspection improves learning for facial emotion recognition

This article has 1 author:
1. Kuzma Strelnikov
This article has no evaluationsLatest version Mar 19, 2026
Multimodal large language models converge on the human-like geometry of abstract emotion

This article has 7 authors:
1. Huiguang He
2. Changde Du
3. Yizhuo Lu
4. Zhongyu Huang
5. Yi Sun
6. Zisen Zhou
7. Shaozheng Qin
This article has no evaluationsLatest version Apr 2, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Evaluating Early, Late and Hybrid Fusion in Multimodal Emotion Detection with Pretrained Models

Artificial emotional introspection improves learning for facial emotion recognition

Multimodal large language models converge on the human-like geometry of abstract emotion