Gene Specific Pathogenicity Predictor for Chromatin-Remodeling BAF Complex-Associated Neurodevelopmental Disorders

Joshua Hack
Mohammad Nazim

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Advancements in whole genome sequencing have increased the number of variants of uncertain significance (VUS) identified in patient genomes. This has created a diagnostic bottleneck for genetic counselors tasked with sifting through these variants and determining those most likely to be causative for a patient's clinical presentation. Machine learning (ML) tools can aid in identifying pathogenic variants from VUS, but there is a need for gene-specific algorithms that predict pathogenic variants with high accuracy. To address this need, we present a workflow for developing gene-specific, ensemble-learning ML tools, that leverage outputs from other algorithms, locations of variants within the gene, and evolutionary conservation data to make a prediction of pathogenicity. Variants in SMARCA2 and SMARCA4 that are associated with rare neurodevelopmental diseases were used to screen 15 ML algorithms. A random forest learner was tuned to yield a final accuracy of 0.93 on holdout data. Generalizing this predictor to other BAF complex proteins resulted in a sharp decline in performance. We trained a final predictor for all genes in the study to create a predictor that identifies pathogenic variants in these BAF subunits with an accuracy of 0.91 on holdout data. This predictor specific to BAF complex proteins performs with higher accuracy and AUROC than any other predictor. The decline in performance when generalized to other proteins emphasizes the need for the gene-specific calibration of predictors. Our workflow for the development of such models provides a quick, computationally inexpensive route for improving the ML tools available to genetic counselors.

Version published to 10.1101/2025.09.11.675179 on bioRxiv
Sep 17, 2025

An Integrative Variant Scoring Function for Finding Novel Genes Associated with Ovarian and Thyroid Cancer

This article has 5 authors:
1. Amanda Bataycan
2. Omodolapo Nurudeen
3. Jonathon E. Mohl
4. Khodeza Begum Mitchell
5. Ming-Ying Leung
This article has no evaluationsLatest version Jan 7, 2026
Benchmarking RNA-seq Tools for Real-World Diagnostic Applications

This article has 15 authors:
1. Sarah Silverstein
2. Kaushik Ganapathy
3. Sandra Donkervoort
4. Veronique Bolduc
5. Ying Hu
6. Justin Moy
7. Prech Uapinyoying
8. Svetlana Gorokhova
9. Vijay Ganesh
10. Ben Weisburd
11. Rotem OrBach
12. A. Reghan Foley
13. Pejman Mohammadi
14. David Adams
15. Carsten Bonnemann
This article has no evaluationsLatest version Jan 29, 2026
Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

An Integrative Variant Scoring Function for Finding Novel Genes Associated with Ovarian and Thyroid Cancer

Benchmarking RNA-seq Tools for Real-World Diagnostic Applications

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods