A Dual-Architecture Deep Learning Pipeline for Real-Time High-Accuracy Arabic Sign Language Recognition

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This research presents a deep learning-based pipeline for Arabic Sign Language (ArSL) recognition to bridge the communication gap for the Deaf and Hard of Hearing community. We propose a robust system that processes both static images and live video streams, translating isolated gestures into corresponding alphabet letters. Our methodology integrates advanced image preprocessing using Google's MediaPipe for hand landmark detection, along with data augmentation. Two classification approaches are developed: a fine-tuned ResNet18 model achieving 98% test accuracy, and an enhanced architecture employing EfficientNet-B2 as a feature extractor combined with a Random Forest classifier, which achieves 99% accuracy on a diverse, participant-rich dataset of 7,856 labelled RGB images. The superior performance of the latter model demonstrates effective feature extraction and generalization. A functional real-time application validates the system's practical utility, offering an accurate and efficient tool for ArSL recognition.

Article activity feed