Emotion Recognition from Bangla Dialect Speech using Privacy-Aware Deep Learning Models: A Comparative Analysis

Jaheen Md. Abrar Khan A
Md Tanjum An Tashrif Tashrif
Shahariar Hossain Mahir Mahir
Umme Sara Sara
Mohammad Shorif Uddin Shorif

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Speech emotion recognition (SER) is critical for creating affective and context-aware human-computer interaction systems. However, SER research in low-resource languages like Bangla is still restricted, especially in terms of dialectal variety and privacy-preserving model training. This paper introduces a Bangla dialect-sensitive, privacy-aware SER framework capable of recognizing five distinct emotional states: neutral, happy, sad, angry, and surprise.We study three hybrid deep learning architectures: a composite EfficientNet-Vision Transformer (EfficientNet-ViT) model, CNN-BiLSTM for extracting spatial-temporal patterns where EmoDARTS using differentiable architecture search for automatic optimization. With a 93.0% F1-score and 95.9% accuracy, EfficientNet-ViT outperforms the other models in a federated learning setting while maintaining data security among dispersed devices. To address data scarcity and improve model generalisability, we use a cross-lingual transfer learning technique. Models are pretrained on high-resource English SER datasets (RAVDESS, SAVEE, and TESS) and then fine-tuned on Bangla datasets (SUBESCO, BanglaSER, and a freshly generated dialect-rich corpus). The suggested technique efficiently handles the issues of dialectal diversity, resource constraints, and privacy in Bangla SER. This methodology shows great promise for scalable implementation in real-world applications and offers a reproducible blueprint for SER in other low-resource language environments.

Version published to 10.21203/rs.3.rs-6629846/v1 on Research Square
May 28, 2025

A Survey Of Face Emotion Recognition Using Deep Learning Methods

This article has 1 author:
1. Prof. Ketan Sarvakar
This article has no evaluationsLatest version Jun 3, 2025
CNNMC: A Convolutional Neural Network with Monte Carlo Dropout for Speaker Recognition

This article has 3 authors:
1. Massimo Orazio Spata
2. Alessandro Ortis
3. Sebastiano Battiato
This article has no evaluationsLatest version May 26, 2025
Enhancing Depressive Post Detection in Bangla: A Comparative Study of TF-IDF, BERT and FastText Embeddings

This article has 3 authors:
1. Saad Ahmed Sazan
2. Mahdi H. Miraz
3. A B M Muntasir Rahman
This article has no evaluationsLatest version May 16, 2025

Listed in

Abstract

Article activity feed

Related articles

A Survey Of Face Emotion Recognition Using Deep Learning Methods

CNNMC: A Convolutional Neural Network with Monte Carlo Dropout for Speaker Recognition

Enhancing Depressive Post Detection in Bangla: A Comparative Study of TF-IDF, BERT and FastText Embeddings