Modeling the Gesture-Speech Relation Through Novel Datasets for Multimodal Signal Analysis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We present preliminary results of a new methodology to study co-speech gesture in relation to specific linguistic structures. We draw on a large-scale video repository with time-aligned transcripts to build corpora in which the same linguistic expression is uttered by different speakers across multiple clips. We then extract dynamic coordinates of key body points to model their variation in relation with what is being said. In this paper, we present analyses of the distribution of wrist motion in gesture space for a dataset of 379 videos with utterances of 44 deictic time expressions in English (words or phrases pointing at the past, present, or future in relation to a center of temporal reference, e.g. “yesterday/today/tomorrow”). Even overall distributions of wrist positions in peripheral areas of gestural motion turn out to be influenced by these semantic distinctions. More fine-grained models are to be expected from the reconstruction of gestural trajectories, based on the chronological sequence of positions detected in each video. These initial results already suggest that so-called non-verbal behavior is deeply structured and attuned to language, quite beyond our current understanding. Once scaled up, such models have the potential to dramatically change any technologies connected to human communicative behavior.