Pathogenic variations illuminate functional constraints in intrinsically disordered proteins
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Intrinsically disordered regions (IDRs) play key roles in cellular signaling and regulation, yet their contribution to human disease remains poorly understood. Here we analyzed nearly one million ClinVar missense variants, focusing on those located within IDRs defined by curated and predicted annotations. We found that pathogenic variants were significantly enriched in short linear motifs (SLiMs) and disordered binding regions, consistent with their central functional importance. To extend these insights beyond existing annotations, we applied AlphaMissense , a deep-learning pathogenicity predictor, and uncovered localized “island-like” patterns of elevated pathogenicity within IDRs. Leveraging these signals, we developed a classifier to prioritize predicted ELM motifs (PEMs), revealing thousands of candidate functional sites linked to major disease classes, including neurological, cardiovascular, and cancer-associated genes. Case studies in POLK, FOXP2, and LMOD3 illustrate how this framework connects genetic variation to molecular mechanisms, providing a scalable route to interpret variants of uncertain significance and advancing our understanding of pathogenicity in the disordered proteome.
Summary
This study reveals how deep-learning pathogenicity predictions can uncover functional motifs within intrinsically disordered regions, providing a new framework for interpreting genetic variation in the disordered proteome.