Evaluation of Head Pose Estimation Algorithms for Sign Language Analysis

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Head pose estimation (HPE) can be used in sign language linguistics and gesture studies, particularly for quantitative assessment of head movements in terms of yaw, pitch, and roll. These measurements can be used to quantify and compare different types of head movements within and across sign languages and co-speech gesture. While optoelectronic camera-based motion capture systems are considered the gold standard for this purpose, their practicality is limited due to high costs, accessibility issues, and the need for markers on the signer’s face. Despite the popularity of HPE as a research area, there has been limited validation of HPE algorithms based on RGB videos, which is more suitable for sign language analysis, and almost no testing using sign-language specific data. This study addresses this gap by providing an overview of existing HPE algorithms and evaluating three state-of-the-art algorithms—MediaPipe, OpenFace, and 6DRepNet—using RGB sign language videos from a MoCap dataset of Finnish Sign Language. The accuracy of these algorithms is compared against an optoelectronic camera-based motion capture system recordings of the same data. The results indicate a good performance of all three algorithms for measuring yaw (with some advantages of MediaPipe), and worse performance for measuring pitch and roll.

Article activity feed