NCBoost v2: a classifier for non-coding variants in Mendelian diseases

Barthélémy Caron
Antonio Rausell

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Motivation

The current diagnostic rate of rare diseases through whole-genome sequencing has stabilized at around 30% on average, highlighting the need for improved computational scores to identify pathogenic variants. In 2019, we developed NCBoost, a supervised-learning approach that mined a comprehensive set of sequence constraint features and proved particularly well suited to identifying high-effect pathogenic non-coding variants in genetic diseases. Since its first release, the substantial increase in the number of variants available for training, as well as the enhanced capacity to detect purifying selection signals from large-scale genome sequencing projects, motivated an update of NCBoost.

Results

We implemented NCBoost v2, a pathogenicity score for non-coding single-nucleotide variants, trained on the largest set of curated pathogenic variants in monogenic Mendelian diseases available to date. It leverages conservation features computed from recent large-scale genomic consortia such as Zoonomia and gnomAD, and incorporates recent splice-altering predictive scores. NCBoost v2 outperformed alternative state-of-the-art methods in a variety of scenarios, providing more consistent scores across non-coding genomic regions and fine-tuning the scoring of pathogenic splice-altering variants in Mendelian disease genes.

Availability

NCBoost v2 software is implemented in Python 3.10 and is freely available under the GNU General Public License Version 3 at https://doi.org/10.5281/zenodo.16029049 and https://github.com/RausellLab/NCBoost-2 , together with precomputed scores for the human genome assembly GRCh38.

Version published to 10.1101/2025.09.18.25336072 on medRxiv
Sep 19, 2025

VUS. Life: Leveraging Vector Embeddings for Rapid and Accurate Pathogenicity Prediction of Genetic Variants

This article has 6 authors:
1. Jiawei Wu
2. Marissa Stutzman
3. Michael Muriello
4. Joy Lincoln
5. Donald G. Basel
6. Xiaowu Gai
This article has no evaluationsLatest version Jan 21, 2026
Decoding Complex Genotype-Phenotype Interactions by Discretizing the Genome

This article has 6 authors:
1. Jędrzej Kubica
2. Hetvi Jethwani
3. Krzysztof H. Banecki
4. Mauricio Moldes
5. Dariusz Plewczynski
6. Ben Busby
This article has no evaluationsLatest version Dec 17, 2025
Global Evaluation of Congenital Heart Disease-Associated Non-Coding Variants

This article has 27 authors:
1. José Rodríguez-Martínez
2. Edwin Peña-Martínez
3. Shreya Sharma
4. Joshua Medina-Feliciano
5. Elise Root
6. Lois Parks
7. Marissa Granitto
8. Diego Pomales-Matos
9. Jean Messon- Bird
10. Adriana Barreiro-Rosario
11. Leandro Sanabria-Alberto
12. Alejandro Rivera-Madera
13. Jessica Rodríguez-Ríos
14. Rosalba Velázquez-Roig
15. Juan Figueroa- Rosado
16. Mackenzie Noon
17. Omer Donmez
18. Carmy Forney
19. Hayley Hesse
20. Katelyn Dunn
21. Xiaoting Chen
22. Matthew Hass
23. Lucinda Lawson
24. Matthew Weirauch
25. Leah Kottyan
26. Steven Reilly
27. Devesh Bhimsaria
This article has no evaluationsLatest version Jan 7, 2026

Discuss this preprint

Listed in

Abstract

Motivation

Results

Availability

Article activity feed

Related articles

VUS. Life: Leveraging Vector Embeddings for Rapid and Accurate Pathogenicity Prediction of Genetic Variants

Decoding Complex Genotype-Phenotype Interactions by Discretizing the Genome

Global Evaluation of Congenital Heart Disease-Associated Non-Coding Variants