Two cortical mechanisms for natural audiovisual processing

Subha Nawer Pushpita
Leila Wehbe

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Understanding how the human brain processes naturalistic audiovisual information remains a central challenge in cognitive neuroscience. Progress has been limited, however, by the difficulty of modeling complex audiovisual feature spaces - most prior work has therefore relied on short, controlled stimuli, or on stimuli from one modality at a time, leaving cortical mechanisms that support real-world comprehension poorly characterized. Further, while recent advances in artificial intelligence now enable the extraction of high-dimensional, time-resolved features from naturalistic stimuli, how cortical regions dynamically process auditory and visual information as time unfolds remains largely unexplored. Using large-scale fMRI data collected while participants watched movies, we developed two complementary computational approaches relying on prediction performance to map the moment-by-moment dynamics of sensory processing across cortical regions: one tracks when one modality predicts a region's activity substantially better than the other, capturing temporal transitions in modality dominance, while the other identifies periods when both modalities predict the region well, indicating balanced representation of auditory and visual information. Together, these analyses reveal two complementary patterns of audiovisual organization across cortex: a pair of "bows" of modality switching - one posterior bow encircling category-selective visual cortex and another anterior bow spanning dorso-lateral frontal areas, and an arrow-like axis of bimodally predicted regions extending from lateral occipital cortex into the temporal cortex. The coexistence of these systems points to a cortical architecture that flexibly reweights sensory inputs while maintaining balanced multimodal representations, supporting robust comprehension of complex natural events. More broadly, this work illustrates how naturalistic neuroimaging experiments informed by modern machine learning approaches can reveal new principles of dynamic audiovisual processing in the human brain.

Version published to 10.1101/2025.11.05.686819 on bioRxiv
Nov 6, 2025

Cross-modal processing of auditory and visual symbol representations in the temporo-parietal cortex

This article has 6 authors:
1. Zhiwei Chen
2. Jan Kurzawski
3. Logan Dowdle
4. Francesco Gentile
5. Dora Gozukara
6. Milene Bonte
This article has no evaluationsLatest version Feb 3, 2026
Biological Motion as a Multisensory Signal: Predictive Integration of Space and Time

This article has 5 authors:
1. Ataol Burak Ozsu
2. Melissa Nur Robinson
3. Andreas Treske
4. Ufuk Onen
5. Burcu A. Urgen
This article has no evaluationsLatest version Dec 23, 2025
Presenting features audiovisually improves working memory for bindings

This article has 4 authors:
1. Nora Turoman
2. Elodie Walter
3. Anaë Motz
4. Laura-Isabelle Klatt
This article has no evaluationsLatest version Dec 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Cross-modal processing of auditory and visual symbol representations in the temporo-parietal cortex

Biological Motion as a Multisensory Signal: Predictive Integration of Space and Time

Presenting features audiovisually improves working memory for bindings