A Dual-Architecture Deep Learning Pipeline for Real-Time High-Accuracy Arabic Sign Language Recognition
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This research presents a deep learning-based pipeline for Arabic Sign Language (ArSL) recognition to bridge the communication gap for the Deaf and Hard of Hearing community. We propose a robust system that processes both static images and live video streams, translating isolated gestures into corresponding alphabet letters. Our methodology integrates advanced image preprocessing using Google's MediaPipe for hand landmark detection, along with data augmentation. Two classification approaches are developed: a fine-tuned ResNet18 model achieving 98% test accuracy, and an enhanced architecture employing EfficientNet-B2 as a feature extractor combined with a Random Forest classifier, which achieves 99% accuracy on a diverse, participant-rich dataset of 7,856 labelled RGB images. The superior performance of the latter model demonstrates effective feature extraction and generalization. A functional real-time application validates the system's practical utility, offering an accurate and efficient tool for ArSL recognition.