Random forest model improves annotation and discovery of variants of uncertain significance in Alzheimer’s and other neurological disorders

Caroline Jonson
Mary B. Makarious
Mathew J. Koretsky
Dan Vitale
Liya Rabkina
Argentina Lario Lago
Manizhe Eslami-Amirabadi
Eliana Marisa Ramos
Priyanka Narayan
Jennifer S. Yokoyama
Andrew B. Singleton
Cornelis Blauwendraat
Mark R. Cookson
Mike A. Nalls
Hampton L. Leonard

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Variants of uncertain significance (VUS) are a bottleneck for genetic discovery and complicate clinical decision-making in Alzheimer’s disease and related neurological disorders (ADRD). We developed MoVUS: Model for Variants of Unknown Significance, a random-forest approach that integrates functional predictors to classify missense VUS.

MoVUS leverages a balanced random forest model trained on dbNSFP v5.1a with high-confidence ClinVar and HGMD labels, using harmonized functional prediction rankscores. MoVUS produced confident, explainable calls, with ∼98% accuracy (AUC ∼0.998), prioritizing potentially pathogenic candidates and down-ranked likely benign variants on independent validation sets of ClinVar-only and HGMD-only variants. In our discovery analyses on ADRD-implicated variants in dbNSFP and from independent collaborator cohorts, we achieved high-confidence classifications on a majority of the unknown variants (average of 55% of discovery variants). We also had access to medical records and family trees for some variants, further validating our findings.

Across held-out and external datasets, MoVUS reports high accuracy alongside confidence scores and helps prioritize actionable candidates, and reduces bias by considering multiple scores for each variant. To facilitate use, we developed a web app for users to browse across 100+ ADRD genes. MoVUS provides transparent, reproducible triage for research follow-up by pairing consensus predictors with SHAP-based visualizations and explanations.

Version published to 10.1101/2025.10.02.680068 on bioRxiv
Oct 2, 2025

The Impact of Structural Variation on Alzheimer’s Disease in the Alzheimer’s Disease Sequencing Project

This article has 13 authors:
1. Songmi Lee
2. Adam C English
3. Gina M Peloso
4. Joshua C Bis
5. Eric Boerwinkle
6. Seung Hoan Choi
7. Nancy L Heard-Costa
8. Honghuang Lin
9. Rui Xia
10. Sudha Seshadri
11. Anita L Destefano
12. Myriam Fornage
13. Fritz J Sedlazeck
This article has no evaluationsLatest version Jan 13, 2026
Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026
An enhanced explainable thyroid disease diagnosis by leveraging cluster-smote and machine learning models

This article has 4 authors:
1. Usman Suleh
2. Badamasi Alhaji Ahmed
3. Farouk Lawan Gambo
4. Fatima Umar Zambuk
This article has no evaluationsLatest version Jan 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Impact of Structural Variation on Alzheimer’s Disease in the Alzheimer’s Disease Sequencing Project

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

An enhanced explainable thyroid disease diagnosis by leveraging cluster-smote and machine learning models