Dual-View Sign Language Recognition via Front-View Guided Feature Fusion for Automatic Sign Language Training

Siyuan Jing
Gaorong Yan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The foundation of an automatic sign language training (ASLT) system lies in word-level sign language recognition (WSLR), which refers to the translation of captured sign language signals into sign words. However, two key issues need to be addressed in this field: (1) the number of sign words in all public sign language datasets is too small, and the words do not match real-world scenarios, and (2) only single-view sign videos are typically provided, which makes solving the problem of hand occlusion difficult. In this work, we design an efficient algorithm for WSLR which is trained on our recently released NationalCSL-DP dataset. The algorithm first performs frame-level alignment of dual-view sign videos. A two-stage deep neural network is then employed to extract the spatiotemporal features of the signers, including hand motions and body gestures. Furthermore, a front-view guided early fusion (FvGEF) strategy is proposed for effective fusion of features from different views. Extensive experiments were carried out to evaluate the algorithm. The results show that the proposed algorithm significantly outperformed existing dual-view sign language recognition algorithms. Compared with several state-of-the-art methods, the proposed algorithm achieves Top-1 accuracy on the NationalCSL6707 dataset that is 10.29 and 11.38 higher than MViT and CNN + Transformer, respectively.

Version published to 10.3390/info17020158
Feb 5, 2026
Version published to 10.20944/preprints202601.1551.v1
Jan 20, 2026

Pose-based Contrastive Representation Learning for Sign Languages

This article has 5 authors:
1. Ushnish Sarkar
2. Bhaswar Chattopadhyay
3. Tapas Samanta
4. H.K Pandey
5. N.P. Yalagoud
This article has no evaluationsLatest version Jan 7, 2026
A Dual-Architecture Deep Learning Pipeline for Real-Time High-Accuracy Arabic Sign Language Recognition

This article has 3 authors:
1. Asmaa Youssef
2. Amira Gaber
3. Shereen M. El-Metwally
This article has no evaluationsLatest version Feb 4, 2026
Entropy-Regularized Joint CTC–Attention Learning for Low-Resource Continuous Sign Language Recognition

This article has 2 authors:
1. Hanan A. Taher
2. Subhi R. M. Zeebaree
This article has no evaluationsLatest version Feb 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Pose-based Contrastive Representation Learning for Sign Languages

A Dual-Architecture Deep Learning Pipeline for Real-Time High-Accuracy Arabic Sign Language Recognition

Entropy-Regularized Joint CTC–Attention Learning for Low-Resource Continuous Sign Language Recognition