Interpretable machine learning classifiers implicate GPC6 in Parkinson’s disease from single-nuclei midbrain transcriptomes
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Parkinson’s disease (PD) is a progressive and devastating neurodegenerative disease. An incomplete understanding of its genetic architecture remains a major barrier to the clinical translation of targeted therapeutics, necessitating novel approaches to uncover elusive genetic determinants. Single-cell and single-nuclear RNA sequencing (scnRNAseq) can help bridge this gap by profiling individual cells for disease-associated differential gene expression and nominating genes for targeted genomic analyses. Here, we introduce a machine learning framework to identify molecular features that characterize post-mortem brain cells from PD patients. We train classifiers to distinguish between PD and healthy cells, then decode the models to unravel the ‘reasons’ behind the classifications, revealing key genes expression signatures that characterize cells from the parkinsonian brain. Application of this framework to three publicly available snRNAseq datasets characterizing the post-mortem midbrain identified cell-type-specific gene sets that accurately classify PD cells across all datasets, demonstrating our approach's capacity to identify robust molecular markers of disease. Targeted genomic analyses of the key genes characterizing PD cells revealed a previously undescribed association between PD and rare variants in GPC6 , a member of the heparan sulfate proteoglycan family, which have been implicated in the intracellular accumulation of α-synuclein preformed fibrils. We replicate this association in three separate case-control cohorts. Our method promises to enhance understanding of the genetic architecture in complex diseases like PD, representing a critical step toward targeted therapeutics. Our publicly available framework is readily applicable across diseases.