Two cortical mechanisms of audiovisual processing in the human brain
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Understanding how the human brain processes naturalistic audiovisual information remains a central challenge in cognitive neuroscience. Progress has been limited, however, by the difficulty of modeling complex audiovisual feature spaces - most prior work has therefore relied on short, controlled stimuli, or on stimuli from one modality at a time, leaving cortical mechanisms that support real-world comprehension poorly characterized. Further, while recent advances in artificial intelligence now enable the extraction of high-dimensional, time-resolved features from naturalistic stimuli, how cortical regions dynamically process auditory and visual information as time unfolds remains largely unexplored. Using large-scale fMRI data collected while participants watched movies, we developed two complementary computational approaches relying on prediction performance to map the moment-by-moment dynamics of sensory processing across cortical regions: one tracks when one modality predicts a region's activity substantially better than the other, capturing temporal transitions in modality dominance, while the other identifies periods when both modalities predict the region well, indicating balanced representation of auditory and visual information. Together, these analyses reveal two complementary patterns of audiovisual organization across cortex: a pair of "bows" of modality switching - one posterior bow encircling category-selective visual cortex and another anterior bow spanning dorso-lateral frontal areas, and an arrow-like axis of bimodally predicted regions extending from lateral occipital cortex into the temporal cortex. The coexistence of these systems points to a cortical architecture that flexibly reweights sensory inputs while maintaining balanced multimodal representations, supporting robust comprehension of complex natural events. More broadly, this work illustrates how naturalistic neuroimaging experiments informed by modern machine learning approaches can reveal new principles of dynamic audiovisual processing in the human brain.