Scene segmentation processes drive EEG-DCNN alignment
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Visual processing in biological and artificial neural networks has been extensively studied through the lens of object recognition. While deep convolutional neural networks (DCNNs) have demonstrated hierarchical feature extraction similar to biological systems (DiCarlo and Cox, 2007; Yamins and DiCarlo, 2016), recent findings reveal a growing discrepancy: DCNNs with higher object categorization accuracy paradoxically show worse performance at predicting neural responses (Xu and Vaziri-Pashkam, 2021; Linsley et al., 2023). Using a large-scale human electroencephalography (EEG) dataset ( n =10, 82,160 trials), we investigate whether this discrepancy arises because human neural EEG signals predominantly reflect scene segmentation processes rather than high-level, category-specific object representations. We trained DCNNs to perform object recognition using visual diets (∼1 million training images across 292 object categories) with systematically varying scene segmentation demands: objects-only (pre-segmented), background-silhouette (explicit boundaries), original/background-only images (requiring full segmentation). Despite substantial differences in categorization accuracy (27-53%), all trained models showed remarkably uniform encoding performance, with peak correlations with neural data at ∼0.1s post-stimulus. Layer-wise analysis revealed a significant negative correlation between categorization accuracy and encoding performance, with earlier network layers better predicting EEG responses than deeper layers specialized for object categorization. This dissociation suggests that EEG signals primarily reflect fundamental scene parsing mechanisms rather than object-specific representations, explaining the growing discrepancy between DCNN’s increasing categorization performance but deteriorating neural prediction performance.
Significance Statement
This research provides a novel perspective on human electroencephalography (EEG) signals during visual processing through systematic manipulation of scene segmentation demands in deep neural networks. Using a large-scale dataset of 82,160 EEG trials and 20 trained DCNNs, we demonstrate that EEG responses primarily reflect early visual processing involved in breaking down and organizing visual scenes (scene segmentation/parsing) rather than high-level object recognition. This finding helps explain previously observed discrepancies between DCNNs’ categorization performance and neural prediction accuracy, suggesting that improving models’ ability to segment scenes, rather than simply recognizing isolated objects, may better align artificial and biological visual processing.