Modeling the Gesture-Speech Relation Through Novel Datasets for Multimodal Signal Analysis

Brian Herreño Jiménez
Sánchez Sánchez Raúl
Alcaraz Carrión Daniel
López Bernal Ariadna
Pagán Cánovas Cristóbal

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We present preliminary results of a new methodology to study co-speech gesture in relation to specific linguistic structures. We draw on a large-scale video repository with time-aligned transcripts to build corpora in which the same linguistic expression is uttered by different speakers across multiple clips. We then extract dynamic coordinates of key body points to model their variation in relation with what is being said. In this paper, we present analyses of the distribution of wrist motion in gesture space for a dataset of 379 videos with utterances of 44 deictic time expressions in English (words or phrases pointing at the past, present, or future in relation to a center of temporal reference, e.g. “yesterday/today/tomorrow”). Even overall distributions of wrist positions in peripheral areas of gestural motion turn out to be influenced by these semantic distinctions. More fine-grained models are to be expected from the reconstruction of gestural trajectories, based on the chronological sequence of positions detected in each video. These initial results already suggest that so-called non-verbal behavior is deeply structured and attuned to language, quite beyond our current understanding. Once scaled up, such models have the potential to dramatically change any technologies connected to human communicative behavior.

Version published to 10.20944/preprints202509.2141.v1
Sep 25, 2025

A richly annotated dataset of in-context co-speech hand gestures across diverse speaker professions

This article has 3 authors:
1. Laura Birka Hensel
2. Stephanie Cheng
3. Stacy Marsella
This article has no evaluationsLatest version Aug 18, 2025
Neural tracking of speech increases in the presence of representational gestures during face-to-face dialogue

This article has 4 authors:
1. Sara Mazzini
2. JUDITH HOLLER
3. Peter Hagoort
4. Linda Drijvers
This article has no evaluationsLatest version Sep 2, 2025
The Co-structuring of Gesture-Vocal Dynamics: An Exploration in Karnatak Music Performance

This article has 3 authors:
1. Lara Pearson
2. Thomas Nuttall
3. Wim Pouw
This article has no evaluationsLatest version Oct 7, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A richly annotated dataset of in-context co-speech hand gestures across diverse speaker professions

Neural tracking of speech increases in the presence of representational gestures during face-to-face dialogue

The Co-structuring of Gesture-Vocal Dynamics: An Exploration in Karnatak Music Performance