DeepSAP: Improved RNA-Seq Alignment by Integrating Transcriptome Guidance with Transformer-Based Splice Junction Scoring
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Advancements in high-throughput sequencing have revolutionized the field of transcriptomics, pro-viding unprecedented insights into gene expression, splicing events, and fusions. Despite these advancements, the analysis of RNA-seq data remains challenging due to the presence of complex splice junctions, multi-mapped reads, and chimeric events. In this study, we present DeepSAP, an innovative approach that improves the accuracy of RNA-seq alignment by integrating Transcriptome-Guided Genomic Alignment, as implemented in GSNAP, with improved splice junction scoring, predicted by a transformer-based deep learning model. Our work demonstrates synergy between these methods, resulting in enhanced detection of splice junctions, identification of indels, and resolution of complex splicing patterns. On a standard benchmark of human simulated datasets, DeepSAP achieves the highest mean F1 score (0.971) for splice junction detection, outperforming DRAGEN (0.933), novoSplice (0.914), STAR (0.821), HISAT2 (0.662), and Subjunc (0.770). By integrating the unique capabilities of transcriptome-guided alignment and large language models, our splice junction scoring approach captures intricate sequence patterns surrounding splice donor and acceptor sites, providing significant advancement in RNA-seq data analysis.