Overlooked features lead to divergent neurobiological interpretations of brain-based machine learning biomarkers

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

A central objective in human neuroimaging is to understand the neurobiology underlying cognition and mental health. Machine learning models trained on brain connectivity data are increasingly used as tools for predicting behavioral phenotypes 1,2 , enhancing precision medicine 3,4 , and improving generalizability compared to traditional MRI studies 5 . However, the high dimensionality of brain connectivity data makes model interpretation challenging 6 . Prevailing practices within the field rely on sparsely selected brain connectivity features, implicitly interpreting identified feature networks as uniquely representative of a given phenotype while overlooking others. Here, we show that commonly overlooked brain connectivity features can achieve similar prediction accuracies while yielding markedly different neurobiological interpretations. Using four large-scale neuroimaging datasets spanning over 12,000 participants and 13 outcomes, we demonstrate that this phenomenon is widespread across cognitive, developmental, and psychiatric phenotypes. It extends to both functional connectivity (fMRI) and structural (DTI) connectomes and remains evident even in external validation. These findings suggest that common practices may lead to feature overinterpretation and a misrepresentation of the neurobiological bases of brain-behavior associations. Such interpretations present only the ‘tip of the iceberg’ when certain disregarded features may be just as meaningful, potentially contributing to ongoing issues surrounding reproducibility within the field. More broadly, our results point to the possibility that multiple neurobiologically distinct models may exist for the same phenotype, with implications for identifying meaningful subtypes within clinical and research populations.

Article activity feed