MobiDeep: an AI-based meta-score for scoring non-coding DNA variations

Abdelhakim Bouazzaoui
Jean-Madeleine de Sainte Agathe
Simon Cabello-Aguilar
Ophélie Evrard
Juliette Nectoux
Marina Konyukh
Leila Qebibo
Thibault Coste
Sandrine M. Caputo
Perrine Brunelle
Yohann Jourdy
Cécile Rouzier
Mireille Cossée
Charles Van Goethem
Olivier Ardouin
Vasiliki Kalatzis
Anne-Françoise Roux
David Baux

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background The interpretation of non-coding variants (NCVs) from genome sequencing represents a major bottleneck in the diagnosis of rare diseases. Existing variant effect predictors (VEPs) show variable performance across different genomic contexts, and a lack of region-specific clinical guidance hinders accurate variant prioritization. This study aimed to rigorously benchmark state-of-the-art VEPs to define region-specific thresholds and to develop MobiDeep, a novel meta-score designed to improve NCV prioritization. Methods We curated a high-confidence dataset of 448 pathogenic NCVs (ClinVar, HGMD, literature) and 38,146 presumed benign NCVs. Critically, variants affecting splicing were excluded to focus on strictly regulatory mechanisms. We benchmarked the performance of ReMM, CADD, GPN-MSA, Cactus241way, and phyloP, both globally and stratified by genomic region (e.g., 5'UTR, 3'UTR). Subsequently, we developed MobiDeep, a neural network integrating these five scores, optimized using Optuna and validated on an independent holdout set of pathogenic NCVs. Results Benchmarking confirmed that no single tool is universally optimal, with performance varying significantly by genomic context; while ReMM excelled in non-coding exons (AUROC = 0.987), GPN-MSA demonstrated superior performance for 3'UTRs (AUROC = 0.901). We established data-driven clinical thresholds, identifying an optimal global cutoff of 10.37 for CADD v1.7, validating previous works of CADD ≥ 10 for regulatory variants and 0.80 for ReMM. Building on these insights, MobiDeep significantly outperformed all individual predictors on an independent test set, achieving an AUROC of 0.973 and an AUPRC of 0.888. In large-scale simulations mimicking a diagnostic, MobiDeep prioritized causal variants effectively, placing 52.0% and 75% within the top 5 and top 20 ranks respectively. Furthermore, the model correctly prioritized all Clinvar pathogenic variants in the recently discovered RNU4-2 non-coding gene. Conclusions Our findings confirm that individual predictors and uniform thresholds are insufficient for interpreting the diverse landscape of non-coding variants. We demonstrate that region-specific calibration is essential for accurate prioritization.. Our meta-score MobiDeep improves classification performance compared to existing tools. This meta-score serves as a robust filter to streamline the identification of high-confidence variants, thereby facilitating focused manual review and subsequent biological validation in diagnostic settings.

Version published to 10.21203/rs.3.rs-8823759/v1 on Research Square
Mar 11, 2026

Current Structural Variant Calling Biases Compromise Clinical Genome Diagnostics

This article has 7 authors:
1. Blaž Vrhovšek
2. Ana Markež Vrhovšek
3. Doroteja Vujinović
4. Jernej Kovač
5. Maruša Debeljak
6. Tadej Battelino
7. Barbara Jenko Bizjan
This article has no evaluationsLatest version Apr 16, 2026
fastVEP: A Fast, Comprehensive Variant Effect Predictor Written in Rust

This article has 1 author:
1. Kuan-lin Huang
This article has no evaluationsLatest version Apr 16, 2026
Calibration of in-frame indel variant effect predictors for clinical variant classification

This article has 10 authors:
1. Haneen Abderrazzaq
2. Mugdha Singh
3. Larry Babb
4. Timothy Bergquist
5. Steven E. Brenner
6. Vikas Pejaver
7. Anne O’Donnell-Luria
8. Predrag Radivojac
9. ClinGen Computational Working Group
10. ClinGen Variant Classification Working Group
This article has no evaluationsLatest version Apr 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Current Structural Variant Calling Biases Compromise Clinical Genome Diagnostics

fastVEP: A Fast, Comprehensive Variant Effect Predictor Written in Rust

Calibration of in-frame indel variant effect predictors for clinical variant classification