Random forest model improves annotation and discovery of variants of uncertain significance in Alzheimer’s and other neurological disorders

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Variants of uncertain significance (VUS) are a bottleneck for genetic discovery and complicate clinical decision-making in Alzheimer’s disease and related neurological disorders (ADRD). We developed MoVUS: Model for Variants of Unknown Significance, a random-forest approach that integrates functional predictors to classify missense VUS.

MoVUS leverages a balanced random forest model trained on dbNSFP v5.1a with high-confidence ClinVar and HGMD labels, using harmonized functional prediction rankscores. MoVUS produced confident, explainable calls, with ∼98% accuracy (AUC ∼0.998), prioritizing potentially pathogenic candidates and down-ranked likely benign variants on independent validation sets of ClinVar-only and HGMD-only variants. In our discovery analyses on ADRD-implicated variants in dbNSFP and from independent collaborator cohorts, we achieved high-confidence classifications on a majority of the unknown variants (average of 55% of discovery variants). We also had access to medical records and family trees for some variants, further validating our findings.

Across held-out and external datasets, MoVUS reports high accuracy alongside confidence scores and helps prioritize actionable candidates, and reduces bias by considering multiple scores for each variant. To facilitate use, we developed a web app for users to browse across 100+ ADRD genes. MoVUS provides transparent, reproducible triage for research follow-up by pairing consensus predictors with SHAP-based visualizations and explanations.

Article activity feed