Pregistered movie-fMRI analyses reveal altered visual feature encoding in autism in pSTS
Curation statements for this article:-
Curated by eLife
eLife Assessment
This valuable study uses naturalistic movie-viewing fMRI and stacked encoding models to investigate sensory feature representations in autistic and non-autistic youth, showing a relative shift toward low-level visual representations in higher-order social cortical regions in autism. The evidence is solid overall, supported by preregistration, a relatively large open dataset, and sophisticated encoding-model analyses, although several methodological and interpretive issues require further clarification and validation. The work will interest researchers in developmental cognitive neuroscience and naturalistic neuroimaging.
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (eLife)
Abstract
Sensory–perceptual differences are widely reported in autism, yet their underlying mechanisms remain unclear. We tested preregistered hypotheses using stacked encoding models applied to naturalistic movie-viewing fMRI from children and adolescents with and without an autism diagnosis from the Healthy Brain Network. We mapped cortical responsiveness to low- and high-level auditory and visual feature spaces. Contrary to enhanced perceptual functioning predictions, autism was not associated with increased low-level encoding in primary sensory cortices. Instead, autistic children and adolescents had reduced high-level visual representations and a relative shift toward low-level over high-level feature encoding in integration and social brain regions including the pSTS and adjacent face/social areas. In pSTS, this high–low weighting tracked Social Responsiveness Scale (SRS) scores. By contrast, audio–visual modality preference and sensory dominance were broadly conserved across groups. Developmentally, encoding exhibited strong, lateralized, modality-congruent age effects. Together, these findings favor weak central coherence accounts over early sensory enhancement, constrain mechanisms to altered visual feature weighting within social/multisensory networks, and demonstrate the value of naturalistic stimuli and encoding models for characterizing sensory-perceptual neurodevelopmental differences.
Article activity feed
-
eLife Assessment
This valuable study uses naturalistic movie-viewing fMRI and stacked encoding models to investigate sensory feature representations in autistic and non-autistic youth, showing a relative shift toward low-level visual representations in higher-order social cortical regions in autism. The evidence is solid overall, supported by preregistration, a relatively large open dataset, and sophisticated encoding-model analyses, although several methodological and interpretive issues require further clarification and validation. The work will interest researchers in developmental cognitive neuroscience and naturalistic neuroimaging.
-
Reviewer #1 (Public review):
Summary:
This study uses stacked encoding models to characterize differences in sensory (visual and auditory) processing between autistic and non-autistic children and adolescents. The authors found no significant enhancement of low-level feature encoding in either visual or auditory cortex, but reduced high-level visual representations and a relative shift toward low-level over high-level visual feature encoding in the posterior superior temporal sulcus (pSTS). The shift in pSTS correlated with social symptom severity (SRS scores). These findings support weak central coherence (WCC) theory over enhanced perceptual functioning (EPF) theory, suggesting an altered visual feature encoding in pSTS in autism.
Strengths:
This study uses sophisticated methodology and an open data set with a relatively large sample …
Reviewer #1 (Public review):
Summary:
This study uses stacked encoding models to characterize differences in sensory (visual and auditory) processing between autistic and non-autistic children and adolescents. The authors found no significant enhancement of low-level feature encoding in either visual or auditory cortex, but reduced high-level visual representations and a relative shift toward low-level over high-level visual feature encoding in the posterior superior temporal sulcus (pSTS). The shift in pSTS correlated with social symptom severity (SRS scores). These findings support weak central coherence (WCC) theory over enhanced perceptual functioning (EPF) theory, suggesting an altered visual feature encoding in pSTS in autism.
Strengths:
This study uses sophisticated methodology and an open data set with a relatively large sample size. fMRI data are acquired during a naturalistic paradigm (i.e., movie watching), which promotes attention and engagement among participants, and provides greater ecological validity. The use of encoding models to explore population-level differences in neural representations of stimulus-computable features is novel. Overall, results provide somewhat modest yet still informative evidence for adjudicating between possible theories of altered sensory processing in autism.
Weaknesses:
Some important methodological details are missing and/or require justification. Some potential confounding factors or unconsidered differences between individuals and/or diagnostic groups should be explored and possibly addressed. Specific major and minor points are raised below.
Major comments:
(1) Unclear description of noise ceiling calculation (line 205-206, 632-634) and potential heterogeneity: it is not clear what data were "split" for the split-half correlation used to calculate noise ceilings. To our knowledge, each participant watched each movie once each, so there is no within-subject repetition available. Were these correlations across participants (i.e., ISC)? If so, does this across-subject metric provide a fair representation of the true noise ceiling, given that a) encoding models themselves are trained within subjects and b) autistic individuals are known to exhibit more idiosyncrasy in responses to naturalistic stimuli (e.g., Hasson et al., 2008)? Moreover, do noise ceilings differ between individual participants, diagnostic groups, and/or with age? If so, how might these differences affect the interpretation of results (e.g., R2 differences)?
(2) Possibly underperforming visual model: given that the visual model in general performed worse than the audio model, the visual vs audio perceptual preference analyses (line 281-290) might be affected by the underlying mismatch between model performance. Though the visual and auditory regions showed similar noise ceilings (Figure 2 S1B), the stacked model performed better in auditory regions than in visual or multimodal regions (Figure 2 S1A). Supporting the same idea, the visual model in general showed lower fitting R2 than the audio model (Figure 2 S2A, Figure 2 S3A vs B). Instead of using mean motion (line 608-614), applying PCA on the raw features might help reduce noise inherent in the raw motion energy features (Malik et al., 2026), therefore improving model performance.
(3) The clipping procedure for unique variance (lines 634-637) requires justification: the unique variance is defined by subtracting high-level R² from stacked R² with explicit clipping when high-level R² is negative or exceeds stacked R². However, in the original stacked regression framework (Lin et al., 2024), unique variance is defined by simple subtraction without such post-hoc adjustment, as the negative R2 is still meaningful, indicating the model performs worse than predicting using the mean value. This requires justification. How frequently does clipping occur, and in which brain regions? Is it an indicator of overfitting or poor model performance? How substantially do results change if clipping is removed? E.g., the hemisphere dominance comparison (line 271-280, Figure 6). Critically, does this procedure affect the key finding regarding SRS/sensory symptom severity correlations in pSTS?
(4) The interpretation of the correlation between SRS with neural patterns is misleading (line 237-242, line 364-366): based on Figure 3, SRS and SSS showed more significant and robust relationship with unique variance of high-level visual feature, meaning that the decrement of high-level feature encoding in STSvp and STSdp, rather than the relative low-level preference, is likely driving the relationship with autism severity and sensory symptom.
(5) Details are missing about how data from the two movie runs were combined. Were the time series concatenated without regard to which movie they originally came from, or was the distinction between movies taken into account for purposes of splitting data into train/test cross-validation folds? The results would be stronger if the authors could show that results replicate across the two movies when they are each analyzed independently, though we recognize that there is perhaps not enough data, especially in the shorter [~4min] movie, to do this. The authors discussed this in lines 412-417, but it would be helpful to provide a justification in the Methods section as well.
(6) Potential feature weight differences across individuals and/or diagnostic categories: since the encoding models were trained for each subject, is there significant variability in feature weights across individuals and/or diagnostic categories (e.g., did the model predictions heavily rely on face for the non-ASD group but not for the ASD group)? If so, how does this change the interpretation of the R2 comparisons? The authors showed the results of stacked feature weight differences between diagnostic categories and their relationship with autism severity and sensory symptoms, but it might be informative to show the raw feature weightings before diving into stacked-weight differences.
-
Reviewer #2 (Public review):
Summary:
This study by Mentch et al. uses naturalistic-movie fMRI and grayordinate-level stacked encoding models to test preregistered hypotheses about low/high-level and audio/visual feature encoding in autism and adolescence from openly available Healthy Brain Network data. Null results reported that autism was not linked to increased low-level encoding in primary sensory cortices. Exploratory analyses showed participants with autism showed reduced high-level visual encoding in social regions (pSTS, face areas), with the high-low feature shift tracking social responsiveness scale (SRS) scores. Age and laterality effects were also found.
Strengths:
(1) This study and hypotheses were preregistered.
(2) The study utilised proper variance partitioning, split-half noise ceilings, FD-threshold sensitivity …
Reviewer #2 (Public review):
Summary:
This study by Mentch et al. uses naturalistic-movie fMRI and grayordinate-level stacked encoding models to test preregistered hypotheses about low/high-level and audio/visual feature encoding in autism and adolescence from openly available Healthy Brain Network data. Null results reported that autism was not linked to increased low-level encoding in primary sensory cortices. Exploratory analyses showed participants with autism showed reduced high-level visual encoding in social regions (pSTS, face areas), with the high-low feature shift tracking social responsiveness scale (SRS) scores. Age and laterality effects were also found.
Strengths:
(1) This study and hypotheses were preregistered.
(2) The study utilised proper variance partitioning, split-half noise ceilings, FD-threshold sensitivity analyses, and an explicit modelling framework that recovers known sensory hierarchies in the aggregated sample. The developmental sampling adds to the interest.
(3) The manuscript is written clearly, laying out the background and theories to be tested with encoding models. The analyses and reporting of results are clear.
Weaknesses:
(1) If I understand correctly, by only averaging the grayordinates that already passed a significance threshold, the resulting parcel value is guaranteed to look stronger than if all grayordinates had been included. This has been raised in neuroimaging (Kriegeskorte et al., 2009; Vul et al., 2009). Can the authors justify these choices?
(2) I assume that the phrase "temporally permuting the order of observations" on Page 22 means random shuffling of time points. The details of this exact permutation are not specified. Both the fMRI BOLD signal and movie features have strong temporal autocorrelation, and random shuffling will destroy this structure. This is important as grayordinate-level survivors will propagate to parcel pools. Circular shifting or phase randomization preserving the autocorrelation spectrum is appropriate.
(3) In the movie feature selection, the low-level visual model contains only two scalars: mean perceptual brightness and a single averaged value across 2,139 motion-energy filters. With only two low-level visual features, the low-level visual model potentially would underestimate low-level visual encoding. The H1.1 toward the null perhaps suggests to this. Principal components of the motion-energy outputs, as was done for the cochleagram, could be used.
(4) The pilot sample composition is not described. Features were selected based on their performance on an independent set of 54 pilot subjects. Please provide age, sex, and diagnostic composition of the pilot sample. The main point being whether the selected features were optimised for a population that differs from the subject studied.
(5) The authors acknowledge the lack of eye-tracking in theory study. I think this should be elaborated, especially why this modality is important for answering sensory and perceptual encoding. Face encoding may not be degraded, but just that faces are not being attended to.
(6) I think a more nuanced distinction about the representational nature of encoding-model R² should be mentioned, especially when the interpretation of findings is related to perceptual functioning (EPF theory). R² measures how well a feature set predicts brain activity, not perceptual function or cognitive integration.
(7) The literature also includes evidence for no Colavita effect, not just reverse Colavita in autism, and the framing should reflect this more even-handedly.
(8) The 0.2 mm per-volume threshold is quite strict. The 40%/60%/80% sensitivity analyses partially address this, but a brief justification for the choice of 0.2 mm would strengthen the Methods.
(9) Figure 1 seems confusing and would benefit from more information or text in the figure.
(10) Figure 2 supplement has caption A labelled twice; please correct.
(11) Acronyms. Please spell out MSI on first mention (page 2) and ISC/ISFC on first mention (page 4).
-
Reviewer #3 (Public review):
Summary:
This study investigates the neural mechanisms underlying sensory-perceptual differences in autism through a naturalistic movie-viewing fMRI paradigm. By employing encoding models, the authors demonstrate that autistic children and adolescents exhibit a specific alteration in visual feature weighting, characterized by a shift toward low-level visual feature encoding in higher-order association regions, particularly the posterior Superior Temporal Sulcus (pSTS). This shift is linked to social symptom severity, providing empirical support for Weak Central Coherence accounts.
Strengths:
The study's primary strengths lie in its methodological rigor and innovative approach. The use of a pre-registered analysis plan ensures transparency and enhances the credibility of the findings, while the encoding …
Reviewer #3 (Public review):
Summary:
This study investigates the neural mechanisms underlying sensory-perceptual differences in autism through a naturalistic movie-viewing fMRI paradigm. By employing encoding models, the authors demonstrate that autistic children and adolescents exhibit a specific alteration in visual feature weighting, characterized by a shift toward low-level visual feature encoding in higher-order association regions, particularly the posterior Superior Temporal Sulcus (pSTS). This shift is linked to social symptom severity, providing empirical support for Weak Central Coherence accounts.
Strengths:
The study's primary strengths lie in its methodological rigor and innovative approach. The use of a pre-registered analysis plan ensures transparency and enhances the credibility of the findings, while the encoding models allow for a fine-grained dissociation of low-level versus high-level feature representations across the cortex. Overall, the writing is clear, the logic is sound, and the results offer a significant contribution to the field by refining our understanding of how sensory processing is differentially organized in autism.
Weaknesses:
While the study presents compelling findings regarding visual feature encoding in autism, several methodological and interpretive limitations warrant consideration. First, the Discussion focuses primarily on WCC and EPF theories, failing to explicitly address how the results intersect with other prominent frameworks mentioned in the Introduction, such as Bayesian predictive coding or E/I imbalance hypotheses. Second, the demographic characteristics and specific sample sizes of the ASD-ADHD and ASD+ADHD subgroups are not reported, limiting the interpretability of the stratified analyses; furthermore, the counterintuitive finding that the ASD+ADHD group resembles controls is not sufficiently discussed. Third, given the significant group difference in IQ and the known relationship between cognitive ability and neural processing, the potential confounding influence of IQ on the neuroimaging results requires more explicit acknowledgment, particularly since IQ was not included as a covariate in the primary models.
-