Population-scale interpretation of RNA isoform diversity enabled by Isopedia
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Alternative splicing generates extensive transcriptomic complexity, yet “novelty” is often inflated because of incomplete reference annotations, with 20-70% of transcripts in RNA-Seq studies labeled as novel. Isopedia provides an expandable data structure for reference-agnostic isoform annotation, which we demonstrate here through a population-scale catalog of 1,007 long-read datasets spanning 37 diverse biological contexts. By transitioning from reference-dependent to evidence-weighted annotation, Isopedia provides the frequency-based context necessary to distinguish stochastic noise from biologically active isoforms. In HG002 benchmarks, Isopedia reduced apparent isoform novelty by up to 26-fold, achieving a >95% annotation rate even for low-abundance isoforms typically missed by standard catalogs. The framework further supports systematic exploration of challenging loci such as pseudogenes and gene fusions. Isopedia transforms isoform discovery into a systematic interpretation of the human transcriptome, providing a critical foundation for clinical and functional RNA research. Isopedia is open source and freely available: https://github.com/zhengxinchang/isopedia .