Discourse context and co-speech gestures jointly shape hierarchical prediction during the processing of a multimodal narrative
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Understanding one another in daily communication depends on predicting language from prior discourse context and visual signals, such as co-speech gestures. However, it remains unclear how discourse context and gestures jointly shape neural predictions during naturalistic language processing. Here, participants watched multimodal narratives with spontaneously-produced gestures during fMRI scanning. Leveraging transformer-based computational modeling, we disentangled linguistic uncertainty from contextual informativity at the sentence level, and observed that these deconstructed measures engaged neural regions associated with predictive processing across multiple multimodal representational levels. Further, greater gesture availability reduced the neural cost associated with these predictive processes and lessened reliance on discourse context, revealing a push-pull synergy between gestures and context. Our findings extend hierarchical predictive processing frameworks, demonstrating that gestures and discourse jointly, rather than additively, constrain neural predictions at multiple representational scales. These results demonstrate the critical and dynamically integrated role of multimodal predictive mechanisms in everyday communication.