A Comparative Study Employing Human Keypoints Estimation and Convolutional Neural Networks for Pakistan Sign Language Translation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Sign Language Translation (SLT) bridges the communication gap between hearing-impaired and hearing individuals by translating sign language into spoken language. Each country has its distinct sign language, with Pakistan Sign Language (PSL) serving the hearing-impaired community in Pakistan. Despite its importance, little work has been done on PSL translation. To address this gap, we applied keypoints estimation and video sequence-based techniques to PSL and conducted a performance analysis. To the best of our knowledge, this paper presents the first comparative study of these two SLT techniques specifically applied to PSL. The first technique leverages keypoints extracted using the MediaPipe library to generate glosses while the second approach uses Convolutional Neural Networks (CNNs) to analyze entire video sequences, identifying patterns corresponding to glosses. Both techniques utilize a pre-trained gloss-to-text translation model based on an attention-based Neural Machine Translation (NMT) system. The dataset used in this study consists of video sentences, split into an 80/20 ratio for training and evaluation. Performance evaluation using BLEU scores reveals that the keypoints-based method outperforms the video sequence-based approach, achieving a BLEU score of 43.94, demonstrating its efficiency in translating signs across diverse backgrounds.