Prior expectations guide multisensory integration during face-to-face communication

Giulia Mazzi
Ambra Ferrari
Maria Laura Mencaroni
Chiara Valzolgher
Mirko Tommasini
Francesco Pavani
Stefania Benetti

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Face-to-face communication relies on the seamless integration of multisensory signals, including voice, gaze, and head movements, to convey meaning effectively. This poses a fundamental computational challenge: optimally binding signals sharing the same communicative intention (e.g. looking at the addressee while speaking) and segregating unrelated signals (e.g. looking away while coughing), all within the rapid turn-taking dynamics of conversation. Critically, the computational mechanisms underlying this extraordinary feat remain largely unknown. Here, we cast face-to-face communication as a Bayesian Causal Inference problem to formally test whether prior expectations arbitrate between the integration and segregation of vocal and bodily signals. Moreover, we asked whether there is a stronger prior tendency to integrate audiovisual signals that show the same communicative intention, thus carrying a crossmodal pragmatic correspondence. In a spatial localization task, participants watched audiovisual clips of a speaker where the audio (voice) and the video (bodily cues) were sampled either from congruent positions or at increasing spatial disparities. Crucially, we manipulated the pragmatic correspondence of the signals: in a communicative condition, the speaker addressed the participant with their head, gaze and speech; in a non-communicative condition, the speaker kept the head down and produced a meaningless vocalization. We measured audiovisual integration through the ventriloquist effect, which quantifies how much the perceived audio position is misplaced towards the video position. Bayesian Causal Inference outperformed competing models in explaining participants’ behaviour, demonstrating that prior expectations guide multisensory integration during face-to-face communication. Remarkably, participants showed a stronger prior tendency to integrate vocal and bodily information when signals conveyed congruent communicative intent, suggesting that pragmatic correspondences enhance multisensory integration. Collectively, our findings provide novel and compelling evidence that face-to-face communication is shaped by deeply ingrained expectations about how multisensory signals should be structured and interpreted.

Author summary

Face-to-face communication is complex: what we say is coupled with bodily signals, offset in time, which may or may not work in concert to convey meaning. Yet, the brain rapidly determines which multisensory signals belong together and which, instead, must be kept apart, suggesting that prior expectations play a crucial role in this decision-making process. Here, we directly tested this hypothesis using Bayesian computational modelling, which allows for isolating the contribution of prior expectations and sensory uncertainty on the final perceptual decision. We found that people have a stronger prior tendency to combine vocal and bodily signals when they convey the same communicative intent (i.e. the speaker addresses the observer concurrently with their head, gaze and speech) relative to when this correspondence is absent. Thus, the brain uses prior expectations to bind multisensory signals that carry converging communicative meaning. These findings provide key insight into the sophisticated mechanisms underpinning efficient multimodal communication.

Version published to 10.1101/2025.02.19.638980v1 on bioRxiv
Feb 19, 2025

Adaptions in Eye-Movement Behavior during Face-to-face Conversations in Noise

This article has 3 authors:
1. Valeska Slomianka
2. Tobias May
3. Torsten Dau
This article has no evaluationsLatest version Feb 28, 2025
The social transfer function: how dynamic predictions of facial consequences drive judgements of social contingency

This article has 3 authors:
1. Rudradeep Guha
2. Pablo Arias Sarah
3. Jean-Julien Aucouturier
This article has no evaluationsLatest version Mar 24, 2025
Tracking eye gaze during cued speech perception

This article has 2 authors:
1. Annahita Sarré
2. Laurent Cohen
This article has no evaluationsLatest version Mar 28, 2025

Listed in

Abstract

Author summary

Article activity feed

Related articles

Adaptions in Eye-Movement Behavior during Face-to-face Conversations in Noise

The social transfer function: how dynamic predictions of facial consequences drive judgements of social contingency

Tracking eye gaze during cued speech perception