Systematic identification of disease-associated 3D neighborhoods in protein structures

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Rare variant association studies (RVAS) have identified hundreds of genes contributing to human disease, yet gene-level signals provide limited insight into the molecular mechanisms underlying pathogenicity. Missense variants, which can be mapped onto three-dimensional protein structures, offer an opportunity to gain novel mechanistic insights. Here, we develop a scalable framework for systematically mapping case and control variants onto protein structures and identifying spatially localized regions enriched for case variants. Our framework builds on the 3D Neighborhood Test (3DNT), which we recently introduced in a single-gene analysis of ATP2B2 , and enables the genome-wide analysis of rare coding variation beyond standard gene-level approaches. We applied 3DNT across multiple large-scale datasets, including Mendelian disease variants from ClinVar, de novo mutations from 37,486 autism spectrum disorder (ASD) probands, and case-control exome sequencing cohorts for epilepsy and schizophrenia. We identified significant clusters in 872 genes for Mendelian disease, in 70 genes for autism, in one gene for epilepsy, and in three genes for schizophrenia. These clusters are strongly enriched for known functional sites and provide insight into both known and previously unrecognized disease genes. Our results demonstrate that scalably integrating RVAS data with protein structure predictions localizes disease-associated variation to specific functional regions and reveals a layer of disease biology that is largely invisible to standard analyses.

Article activity feed