EnvisionHGdetector: A Computational Framework for Co-Speech Gesture Detection, Kinematic Analysis, and Interactive Visualization

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We introduce the EnvisionHGdetector toolkit (v1.0.0.1), which allows for the kinematic analysisof automatically detected co-speech gestures from single-person videos. The convolutional neural net-work model for detecting gestures is trained using TensorFlow on five open datasets – ZHUBO, SaGA,MULTISIMO, ECOLANG, and TEDM3D – which, combined, represent over 8,000 instances of ges-tures with varying recording conditions. Inspired and further building on the action detection tool Nod-ding Pigeon [57], it utilizes MediaPipe Holistic for skeleton-based tracking, and a CNN model for classi-fication, running efficiently with CPU only. The tool is designed to work in an out-of-the-box fashion,applicable for any single-person frontal view recording, without custom (p)re-training on specific targetdatasets. On our test set, we attain an average gesture detection accuracy of 73-78% (False Positive Rate 21-33%, False Negative Rate 36-48%) for co-speech gesture detection. The envisionHGdetector pypi packagepackage produces as output gesture segment labels, label confidence time series, segmented videos, as wellas ELAN files, DTW distance matrices for gestures similarities, kinematic feature summaries, and a datavisualization and exploration dashboard. The code for the package and training is available on GitHub,and the training feature data is archived at the OSF. We further discuss this project’s outlook, which aimsto improve the integration of open-source computational innovations in gesture studies.

Article activity feed