An End-to-End Bengali Speech-to-Sign Language Generation Framework Using Fine-Tuned Whisper ASR and Grapheme-Level Visual Mapping

Anzim Hasan Nabil
Urbo Saha
Md Mahmudul Alom Sifat
Md Atiqur Rahman Jishan
S.M. Foyez Alex
Rafiul Alif
Sadia Akter Sarika
Arnab Nandi Eshan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper provides an end-to-end speech-to-sign language generation model using fine-tuned Whisper-small automatic speech recognition and a grapheme-level visual mapping unit for sign synthesis. The system closes the very communication gap which the Bengali-speaking deaf and hard-of-hearing people are confronted with by providing real-time translation of spoken Bengali to synchronized sign language videos. The ASR module is optimized by strategic layer freezing, Bengali-specific text normalization, and fine-tuning on the Common Voice 13.0 (bn) dataset with a word error rate (WER) of 35.41% and character error rate (CER) of 11.45% for 5.5k fine-tuning steps. The transcribed text is split into its component graphemes by a customized regular expression to handle intricate Bengali compound characters and diacritical marks. They are then projected onto their signed label in sign language from a pre-curated image database of Bangla Sign Language. With OpenCV, annotated and aligned into their valid sequences of signs, the images produce interpretable video output at a fixed frame rate. The system was compared against several baseline Bengali ASR models, which were discovered to perform higher transcription accuracy while including explainable visual output missing in prior works. In addition to its demonstration of superior performance, the system also provides the scalability to other sign systems and languages because it is modular. This work is a new, realistic, and culturally appropriate assistive technology, providing improved access for the Bengali-speaking community of deaf and hard-of-hearing and paving the way for future speech–sign bidirectional translation systems.

Version published to 10.21203/rs.3.rs-7652580/v1 on Research Square
Oct 1, 2025

BdSLW401: Transformer-Based Word-Level Bangla Sign Language Recognition Using Relative Quantization Encoding (RQE)

This article has 4 authors:
1. Husne Ara Rubaiyeat
2. Njayou Youssouf
3. Md Kamrul Hasan
4. Hasan Mahmud
This article has no evaluationsLatest version Nov 5, 2025
Arabic Sign Language (ARSL) Recognition and Translation into Text

This article has 3 authors:
1. Lakehal Maya Amani
2. Haddouche Milissa
3. Kaddouri Nassim
This article has no evaluationsLatest version Sep 17, 2025
Benchmarking OCR and Vision-Language Models for Turkish Text Recognition: A Comprehensive Evaluation Using Synthetic Data

This article has 4 authors:
1. Yasin Yılmaz
2. Erol Görkem Hanoğlu
3. Ayşe Gül Özkan
4. Kasım Öztoprak
This article has no evaluationsLatest version Oct 14, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

BdSLW401: Transformer-Based Word-Level Bangla Sign Language Recognition Using Relative Quantization Encoding (RQE)

Arabic Sign Language (ARSL) Recognition and Translation into Text

Benchmarking OCR and Vision-Language Models for Turkish Text Recognition: A Comprehensive Evaluation Using Synthetic Data