Enhancing Missense Variant Classification in Predicted Intrinsically Disordered Regions
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The accurate classification of missense variants is a fundamental challenge in genomics, particularly for those within intrinsically disordered regions (IDRs) where the performance of existing computational predictors is suboptimal. To address this, we developed a machine learning model that extends traditional missense tools with properties that infer globular IDR conformation, phase separation, and protein embeddings. Using ClinVar variant classifications as ground truth, AlphaMissense, EVE, and ESM1b were the highest scoring unsupervised in silico missense predictors for IDR variants. Our baseline model, using only IDR-specific features achieved competitive performance on the hold-out test set with a PR-AUC of 0.800. Critically, when these IDR features were combined with these methods we saw significant overall improvement. The AlphaMissense-Enhanced model increased its PR-AUC from 0.807 to 0.931. Similarly, ESM1b-Enhanced improved PR-AUC from 0.679 to 0.878 and EVE increased from 0.591 to 0.918. These results demonstrate the effectiveness of our enhancements for classifying missense variants in IDRs and highlight its ability to complement existing in silico missense predictors.