A ratio-based framework using Quartet reference materials for integrating long- and short-read RNA-seq

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Long-read RNA sequencing (lrRNA-seq) enables full-length transcript profiling but is confounded by technical batch effects that compromise quantification and prevent data integration across platforms, protocols, and laboratories. The lack of a transcriptome-wide biological ground truth has hindered objective benchmarking. To address these dual challenges, we leveraged certified Quartet reference materials to generate one of the largest multi-center lrRNA-seq resources to date: over one billion long reads from 144 libraries across four PacBio and Nanopore protocols in four independent laboratories. We first establish that ratio-based quantification against built-in reference samples effectively removes technical noise, revealing underlying biological signals. We then constructed the first ratio-based reference datasets for full-length transcripts— comprising 10,218 isoforms and 6,032 alternative splicing (AS) events—and orthogonally validated them with RT–qPCR. Finally, a comprehensive benchmark using these ground truths reveals that a hybrid strategy integrating long- and short-read data (hybrid-seq) achieves the highest quantification accuracy for both isoforms and AS events. Our work provides a foundational framework and resource for evaluating lrRNA-seq technologies and accelerating the standardization of full-length transcriptomics for research and clinical applications.

Article activity feed