Analyzing the Performance of Deep Learning Splice Prediction Algorithms

Nathan Fortier
Gabe Rudy
Andreas Scherer

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

SpliceAI has become the leading computational tool for predicting splice-altering variants, but restrictive licensing has limited its adoption by commercial clinical laboratories. While open-source reimplementations have emerged with author-reported comparisons, independent benchmarking across diverse datasets is needed to establish their practical equivalence.

We compared the original SpliceAI algorithm against two open-source alternatives (OpenSpliceAI and CI-SpliceAI) across three independent benchmarks: a curated dataset of 1,316 functionally validated variants, 213 variants with experimental splice assay data, and 58,064 clinically classified variants from ClinVar. All deep-learning methods were also compared with a legacy ensemble of four traditional splice prediction algorithms (MaxEntScan, NNSplice, GeneSplicer, and PWM), enabling direct comparison between modern and conventional approaches. Across all benchmarks, the deep-learning models consistently outperformed the ensemble of traditional algorithms. We evaluated sensitivity, specificity, and balanced accuracy for each algorithm, and performed statistical testing to assess significance of performance differences. Additionally, we conducted a correlation analysis on 100,000 variants to quantify the concordance of splice-scores and the agreement on splice site positions between implementations.

All three deep learning algorithms demonstrated comparable performance on the literature-curated benchmark (balanced accuracies: 89.5-90.7%) and the ClinVar dataset (88.9-89.5%). While both open-source solutions achieved a statistically significantly higher accuracy than SpliceAI on the ClinVar dataset, the magnitude of this improvement was small and unlikely to be of practical significance. On the functional splice assay dataset, the original SpliceAI achieved the highest accuracy (83.6%), while OpenSpliceAI showed significantly lower performance (74.6%, p = 0.019). Correlation analysis revealed that CI-SpliceAI maintained balanced concordance across splice event types (ρ = 0.786-0.883), whereas OpenSpliceAI exhibited asymmetric performance with stronger correlation for loss events (ρ = 0.924-0.940) than gain events (ρ = 0.668-0.677). Both implementations demonstrated high spatial agreement with SpliceAI, with exact splice site position match rates exceeding 90% for all event types. Together, these results demonstrate that both open-source reimplementations of SpliceAI successfully reproduce the predictive behavior of the original algorithm across multiple evaluation contexts, while consistently outperforming traditional splice prediction methods.

Version published to 10.1101/2025.11.20.689501 on bioRxiv
Nov 20, 2025

Benchmarking RNA-seq Tools for Real-World Diagnostic Applications

This article has 15 authors:
1. Sarah Silverstein
2. Kaushik Ganapathy
3. Sandra Donkervoort
4. Veronique Bolduc
5. Ying Hu
6. Justin Moy
7. Prech Uapinyoying
8. Svetlana Gorokhova
9. Vijay Ganesh
10. Ben Weisburd
11. Rotem OrBach
12. A. Reghan Foley
13. Pejman Mohammadi
14. David Adams
15. Carsten Bonnemann
This article has no evaluationsLatest version Jan 29, 2026
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025
Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Benchmarking RNA-seq Tools for Real-World Diagnostic Applications

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods