StringTie3 Improves Total RNA-seq Assembly by Resolving Nascent and Mature Transcripts
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate assembly of rRNA-depleted (total) RNA-seq remains challenging because existing methods often conflate incomplete, nascent RNA with fully processed mature isoforms, leading to misassemblies and quantification errors that skew downstream analyses. Here, we present StringTie3, a major update to the widely used StringTie assembler, specifically designed for total RNA-seq. This new version introduces two key innovations: (1) a nascent mode that models co-transcriptional splicing to separate nascent from mature transcripts, and (2) a refined long-read module that distinguishes genuine polyadenylation sites from poly(A)-priming artifacts. Across short-, long-, and hybrid-read datasets, StringTie3 substantially reduces assembly errors and outperforms existing tools, boosting precision by up to 20% in short-read total RNA-seq and improving sensitivity and precision by as much as 37% and 75%, respectively, in long-read assemblies. In Argonaute knockout experiments, nascent-mode analysis shows that single knockouts predominantly alter nascent transcripts while leaving mature RNA largely unchanged, whereas double or triple knockouts disrupt both fractions. Applying this approach to breast cancer samples shows that, although nascent and mature RNA levels often correlate, certain extracellular matrix and tumor suppressor genes deviate from this pattern, suggesting post-transcriptional regulation. By accurately reconstructing transcriptomes and distinguishing nascent from mature RNA, StringTie3 reveals hidden layers of RNA regulation and provides a powerful framework for investigating transcriptional and post-transcriptional processes in total RNA-seq data.