Amaranth: Enhanced Single-Cell Transcript Assembly via Discriminative Modeling of UMI Reads and Internal Reads
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Single-cell RNA sequencing has transformed transcriptome profiling at cellular resolution. A central yet unsolved challenge is the accurate reconstruction of full-length transcripts for individual cells. Emerging scRNA-seq protocols can produce reads that span entire transcripts, enabling the isoform-level expression analysis. For example, Smart-seq protocols combine UMI-linked reads that index and stitch together multiple reads from the same molecule, with internal reads filling coverage gaps. Here, we report a previously unrecognized phenomenon that, UMI reads and internal reads exhibit markedly different biological and statistical properties in strandness, 5’/3’ coverage bias, and locality distribution. Existing assemblers fail to leverage these distinctions and consequently yielding suboptimal assembly. We show that discriminative modeling of UMI reads and internal reads can drastically increase the assembly accuracy. Based on these insights, we developed Amaranth, a new single-cell assembler, that implements a set of new heuristics specifically designed to address the distinct biases of UMI-linked and internal reads. These heuristics enable accurate assignment of strandness for internal reads, reliable refinement of the splicing graph, and precise determination of transcript start and end sites, collectively resulting in substantial improvements. We also developed Amaranth-meta, that integrates all cells to produce improved assembly for individual cells. Benchmarked on Smart-seq3 datasets from human and mouse, Amaranth outperformed other state-of-the-art assemblers in assembling individual cells and in meta-assembly. Amaranth advances isoform-level analysis in single-cell transcriptomics, facilitating detailed studies at cellular resolution.