Topological data analysis communities reveal gene-environment-brain subtypes of major depression in UK Biobank and multi-site cohorts

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Major depressive disorder (MDD) exhibits substantial clinical heterogeneity complicating prognosis definition and treatment selection. Characterizing MDD subtypes through distinct clinical manifestations could enhance personalized therapeutic approaches. We developed a topological data analysis (TDA) framework with graph-based community detection to identify homogeneous patient subgroups using multimodal data integration. We implemented a TDA pipeline in UK Biobank MDD participants with gene-environment (G-E, N=20,715) and gene-environment-neuroimaging (G-E-I, N=3,044) data. We systematically compared predictive capabilities across genetic, environmental, and neuroimaging features, alone and combined, for 18 health-related outcomes. For the best-predictive set of features identified for each outcome, a novel two-stage feature ranking approach identified features relevant for graph construction and community-based outcome differentiation. Cross-cohort validation utilized two independent datasets.

G-E interactions demonstrated superior predictive performance for 13 clinical outcomes, including treatment-resistant depression (TRD), symptom subtypes, and suicidal phenotypes. Community profiling revealed distinct vulnerability pathways: trauma-stress exposures linked to TRD and episode severity, while substance-behavioral profiles associated with anxious symptoms. Environmental factors emerged as primary determinants of most health outcomes, whereas neuroimaging features optimally predict medical comorbidities. Cross-cohort validation confirmed replication for multiple outcomes: self-harm behavior and anxious features (GSRD), TRD and vascular diseases (HSR), with consistent environmental stress-related predictive features across cohorts. TDA successfully identified clinically relevant MDD subgroups with unique multimodal signatures. These findings underscore the essential role of integrating genetic, environmental, and neuroimaging characteristics for robust health outcome prediction, establishing TDA-based community detection as an effective framework for MDD patient stratification and advancing precision medicine approaches in depression management.

Significance Statement

Topological Data Analysis (TDA) combined with community detection was used to identify clinically meaningful subgroups within Major Depressive Disorder (MDD) from multimodal UK Biobank data integrating genetic, environmental, and neuroimaging features. We systematically compared unimodal and multimodal feature sets to stratify patients across 18 health-related outcomes, with cross-cohort validation in independent datasets. Gene-by-environment interactions emerged as optimal predictors for mental health outcomes, revealing distinct vulnerability pathways: trauma-stress profiles predicted treatment resistance and episode severity, while substance-behavioral patterns were linked to anxious and neurovegetative symptoms. Brain imaging features best predicted medical comorbidities, particularly vascular diseases. Cross-cohort validation confirmed replication across populations. These findings establish TDA-based community detection as a powerful framework for MDD stratification, advancing precision psychiatry and personalized intervention strategies.

Article activity feed