EGY-MER: Establishing The First Egyptian Arabic Multimodal Emotion Recognition Dataset for Affective Computing

Amany Hussein
May Hussien
Ammar Hassona
Wolfgang Minker
Mohammed A.-M. Salem
Nada Sharaf

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This research paper introduces EGY-MER, a new multimodal dataset for emotion recognition in the Egyptian Arabic language. The aim is to fill an important research gap in affective computing regarding this dialect. The data samples were collected and organized using the MODALINK pipeline, which provides synchronized multimodal alignment and high-quality annotations. Each sample comprises transcribed speech, corresponding audio, and facial frames with an associated emotion category. Three pretrained encoders were used for establishing baseline results: text data was processed with AraBERTv2, speech with Wav2Vec2-ER, and vision data with a Swin Transformer. A late-fusion strategy was used to combine high-level representations from each encoder. Baseline experiments revealed that combining the different modalities results in improved emotion recognition performance, in contradiction to the unimodal configurations. Weighted F1 and macro-F1 scores suggest the potential of cross-modal features for capturing affective cues in Egyptian Arabic. In addition, the results demonstrate the dataset’s consistency and applicability in multimodal learning research. This research presents the first dataset for multimodal emotion recognition in Egyptian Arabic, along with reproducible baselines. The main aim is that the dataset and provided benchmark models will facilitate further research in emotion recognition for low-resource languages, multimodal fusion, and affective computing in Arabic.

Version published to 10.21203/rs.3.rs-8006895/v1 on Research Square
Nov 20, 2025

CLARA: Enhancing Multimodal Sentiment Analysis via Efficient Vision-Language Fusion

This article has 3 authors:
1. Phuong Lam
2. Phan Thi Tuoi
3. Thien Khai Tran
This article has no evaluationsLatest version Jan 7, 2026
Adaptive Contextualized Multi-feature Fusion Network for Robust Cross-Linguistic Speech Emotion Recognition

This article has 2 authors:
1. Haoyu Cen
2. Yutian Gai
This article has no evaluationsLatest version Dec 30, 2025
Explainable Amharic Emotional Text Classification Using Transfer Learning

This article has 3 authors:
1. Demeke Endalie
2. Yeshimebet Bayu
3. Tesfa Tegegne
This article has no evaluationsLatest version Jan 13, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

CLARA: Enhancing Multimodal Sentiment Analysis via Efficient Vision-Language Fusion

Adaptive Contextualized Multi-feature Fusion Network for Robust Cross-Linguistic Speech Emotion Recognition

Explainable Amharic Emotional Text Classification Using Transfer Learning