Prior expectations guide multisensory integration during face-to-face communication

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Face-to-face communication relies on the seamless integration of multisensory signals, including voice, gaze, and head movements, to convey meaning effectively. This poses a fundamental computational challenge: optimally binding signals sharing the same communicative intention (e.g. looking at the addressee while speaking) and segregating unrelated signals (e.g. looking away while coughing), all within the rapid turn-taking dynamics of conversation. Critically, the computational mechanisms underlying this extraordinary feat remain largely unknown. Here, we cast face-to-face communication as a Bayesian Causal Inference problem to formally test whether prior expectations arbitrate between the integration and segregation of vocal and bodily signals. Moreover, we asked whether there is a stronger prior tendency to integrate audiovisual signals that show the same communicative intention, thus carrying a crossmodal pragmatic correspondence. In a spatial localization task, participants watched audiovisual clips of a speaker where the audio (voice) and the video (bodily cues) were sampled either from congruent positions or at increasing spatial disparities. Crucially, we manipulated the pragmatic correspondence of the signals: in a communicative condition, the speaker addressed the participant with their head, gaze and speech; in a non-communicative condition, the speaker kept the head down and produced a meaningless vocalization. We measured audiovisual integration through the ventriloquist effect, which quantifies how much the perceived audio position is misplaced towards the video position. Bayesian Causal Inference outperformed competing models in explaining participants’ behaviour, demonstrating that prior expectations guide multisensory integration during face-to-face communication. Remarkably, participants showed a stronger prior tendency to integrate vocal and bodily information when signals conveyed congruent communicative intent, suggesting that pragmatic correspondences enhance multisensory integration. Collectively, our findings provide novel and compelling evidence that face-to-face communication is shaped by deeply ingrained expectations about how multisensory signals should be structured and interpreted.

Author summary

Face-to-face communication is complex: what we say is coupled with bodily signals, offset in time, which may or may not work in concert to convey meaning. Yet, the brain rapidly determines which multisensory signals belong together and which, instead, must be kept apart, suggesting that prior expectations play a crucial role in this decision-making process. Here, we directly tested this hypothesis using Bayesian computational modelling, which allows for isolating the contribution of prior expectations and sensory uncertainty on the final perceptual decision. We found that people have a stronger prior tendency to combine vocal and bodily signals when they convey the same communicative intent (i.e. the speaker addresses the observer concurrently with their head, gaze and speech) relative to when this correspondence is absent. Thus, the brain uses prior expectations to bind multisensory signals that carry converging communicative meaning. These findings provide key insight into the sophisticated mechanisms underpinning efficient multimodal communication.

Article activity feed