Analyzing the Performance of Deep Learning Splice Prediction Algorithms
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
SpliceAI has become the leading computational tool for predicting splice-altering variants, but restrictive licensing has limited its adoption by commercial clinical laboratories. While open-source reimplementations have emerged with author-reported comparisons, independent benchmarking across diverse datasets is needed to establish their practical equivalence.
We compared the original SpliceAI algorithm against two open-source alternatives (OpenSpliceAI and CI-SpliceAI) across three independent benchmarks: a curated dataset of 1,316 functionally validated variants, 213 variants with experimental splice assay data, and 58,064 clinically classified variants from ClinVar. All deep-learning methods were also compared with a legacy ensemble of four traditional splice prediction algorithms (MaxEntScan, NNSplice, GeneSplicer, and PWM), enabling direct comparison between modern and conventional approaches. Across all benchmarks, the deep-learning models consistently outperformed the ensemble of traditional algorithms. We evaluated sensitivity, specificity, and balanced accuracy for each algorithm, and performed statistical testing to assess significance of performance differences. Additionally, we conducted a correlation analysis on 100,000 variants to quantify the concordance of splice-scores and the agreement on splice site positions between implementations.
All three deep learning algorithms demonstrated comparable performance on the literature-curated benchmark (balanced accuracies: 89.5-90.7%) and the ClinVar dataset (88.9-89.5%). While both open-source solutions achieved a statistically significantly higher accuracy than SpliceAI on the ClinVar dataset, the magnitude of this improvement was small and unlikely to be of practical significance. On the functional splice assay dataset, the original SpliceAI achieved the highest accuracy (83.6%), while OpenSpliceAI showed significantly lower performance (74.6%, p = 0.019). Correlation analysis revealed that CI-SpliceAI maintained balanced concordance across splice event types (ρ = 0.786-0.883), whereas OpenSpliceAI exhibited asymmetric performance with stronger correlation for loss events (ρ = 0.924-0.940) than gain events (ρ = 0.668-0.677). Both implementations demonstrated high spatial agreement with SpliceAI, with exact splice site position match rates exceeding 90% for all event types. Together, these results demonstrate that both open-source reimplementations of SpliceAI successfully reproduce the predictive behavior of the original algorithm across multiple evaluation contexts, while consistently outperforming traditional splice prediction methods.