Advancing Sign Language Interpretation with Transfer Learning and Multimodal Features
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Sign languages are rich visual languages used by deaf and hard-of-hearing communities around the world. An acute shortage of trained human interpreters motivates the development of automatic sign language recognition systems. Building on the Sign Language Interpreter using Deep Learning project, this paper introduces an enhanced evaluation framework for small-vocabulary interpreters and reports new experiments that go beyond the baseline convolutional neural network (CNN). We fine-tune state-of-the-art architectures such as ResNet-50 and EfficientNet on the American Sign Language (ASL) alphabet dataset, integrate hand-landmark features extracted by MediaPipe into a recurrent backbone, and perform cross-dataset evaluation on a large public dataset. Additional metrics—including macro/micro F1, Cohen’s kappa and per-class recall—provide a more nuanced assessment than overall accuracy. Robustness tests examine lighting, background clutter, signer diversity and adversarial perturbations. We also discuss ethical and accessibility considerations and reflect on the practical impact of hackathon-style prototypes. The results demonstrate that lightweight models can be rapidly improved via transfer learning and multimodal fusion while retaining usability on commodity hardware. Our work offers a blueprint for researchers and practitioners seeking to translate small-scale prototypes into equitable, scalable accessibility solutions. The code supporting this work is available at https://github.com/Manishms18/Sign-Language-Advance.