EnvisionHGdetector: A Computational Framework for Co-Speech Gesture Detection, Kinematic Analysis, and Interactive Visualization

Wim Pouw
Sharjeel Ahmed Shaikh
James Trujillo
Bosco Yung
Antonio Rueda-Toicen
Gerard de Melo
Babajide Owoyele

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We introduce the EnvisionHGdetector toolkit (v3.0.4),which allows for the kinematic analysis of automatically detected co-speech gestures from single-person videos. The Convolutional NeuralNetwork (CNN) and Light Gradient Boosting Machine (LightGBM)model for detecting gestures is trained on seven open datasets withvarying recording conditions: ZHUBO, SaGA, SaGA++, MULTI-SIMO, ECOLANG, TEDM3D, and GESRes. These data combinedyielded 7774 unique gesture instances amenable for data augmen-tation, pose extraction, and model training. Inspired and furtherbuilding on the action detection tool Nodding Pigeon (Yung, 2022),EnvisionHGdetector utilizes MediaPipe Holistic for skeleton-based tracking, and a CNN/ LightGBM model for classification. The toolis designed to work in an out-of-the-box fashion, broadly applicablefor any single-person frontal view recording, without custom (p)re-training on specific target datasets and without needing GPU supporton inference. On our test set, we attain a class-balanced gesturedetection accuracy of 60.1% with our CNN model and 69.4% withthe LightGBM model. Additionally, on our out-of-sample test set,CNN achieves an average accuracy of 81.3% while LightGBM achieves75.0%. The envisionHGdetector PyPI package produces as outputgesture segment labels, label confidence time series, segmented videos,as well as ELAN files, DTW distance matrices for gesture similarities,kinematic feature summaries, and a data visualization and explorationdashboard. The code for the package and training is available onGitHub, and the training feature data is archived at the OSF. Wefurther discuss this project’s outlook, which aims to improve the inte-gration of open-source computational innovations in gesture studiesand cognitive science.

Version published to 10.31234/osf.io/psg5f_v2 on OSF Preprints
Apr 19, 2026
Version published to 10.31234/osf.io/psg5f_v1 on OSF Preprints
Feb 26, 2025

Research on Lightweight dynamic gesture recognition model driven by Meta-learning under Small Sample conditions

This article has 3 authors:
1. Yaxu Xue
2. Weidi Huang
3. Chunbiao Gan
This article has no evaluationsLatest version Apr 17, 2026
Automated Yoga Pose Classification Using Deep Learning on Image-Based Datasets

This article has 8 authors:
1. Anish Antony
2. M.A.H. Farquad
3. Ashvini Alashetty
4. Sachin Kumar
5. Punitkumar Basavaraj Nayak
6. Geethanjali P P
7. sachin sharma
8. ( Kamanuri Sekhar) Sekhar K. Sekhar
This article has no evaluationsLatest version Apr 14, 2026
Deep Learning-Based Framework for Filtering Objectionable Scenes in Cartoon Videos

This article has 8 authors:
1. Irshad Ullah
2. Sameed ur Rehman
3. Wajahat Akbar
4. Altaf Hussain
5. Raaz Waheeb Attar
6. Ruzat Ullah
7. Tariq Hussain
8. Amal Hassan Alhazmi
This article has no evaluationsLatest version Apr 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Research on Lightweight dynamic gesture recognition model driven by Meta-learning under Small Sample conditions

Automated Yoga Pose Classification Using Deep Learning on Image-Based Datasets

Deep Learning-Based Framework for Filtering Objectionable Scenes in Cartoon Videos