EnvisionHGdetector: A Computational Framework for Co-Speech Gesture Detection, Kinematic Analysis, and Interactive Visualization
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We introduce the EnvisionHGdetector toolkit (v3.0.4),which allows for the kinematic analysis of automatically detected co-speech gestures from single-person videos. The Convolutional NeuralNetwork (CNN) and Light Gradient Boosting Machine (LightGBM)model for detecting gestures is trained on seven open datasets withvarying recording conditions: ZHUBO, SaGA, SaGA++, MULTI-SIMO, ECOLANG, TEDM3D, and GESRes. These data combinedyielded 7774 unique gesture instances amenable for data augmen-tation, pose extraction, and model training. Inspired and furtherbuilding on the action detection tool Nodding Pigeon (Yung, 2022),EnvisionHGdetector utilizes MediaPipe Holistic for skeleton-based tracking, and a CNN/ LightGBM model for classification. The toolis designed to work in an out-of-the-box fashion, broadly applicablefor any single-person frontal view recording, without custom (p)re-training on specific target datasets and without needing GPU supporton inference. On our test set, we attain a class-balanced gesturedetection accuracy of 60.1% with our CNN model and 69.4% withthe LightGBM model. Additionally, on our out-of-sample test set,CNN achieves an average accuracy of 81.3% while LightGBM achieves75.0%. The envisionHGdetector PyPI package produces as outputgesture segment labels, label confidence time series, segmented videos,as well as ELAN files, DTW distance matrices for gesture similarities,kinematic feature summaries, and a data visualization and explorationdashboard. The code for the package and training is available onGitHub, and the training feature data is archived at the OSF. Wefurther discuss this project’s outlook, which aims to improve the inte-gration of open-source computational innovations in gesture studiesand cognitive science.