Tresor: An integrated platform for simulating transcriptomic reads with realistic PCR error representation across various RNA sequencing technologies

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The rapid advancement of high-throughput sequencing technologies has spurred the development of numerous computational tools designed to identify gene expression patterns from growing datasets at both bulk and single-cell sequencing levels. The recent advent of longread sequencing technologies has further accelerated the availability and refinement of these tools. The lack of ground-truth labels and annotations in sequencing data presents a significant challenge for evaluating the efficacy of analytical tools. To address this, we developed Tresor, an integrated platform for simulating both short and long reads at bulk and single-cell levels. We devised a tree-based algorithm to significantly accelerate in silico experiments at high PCR cycles. Tresor allows for customising sequencing libraries with highly modular and flexible read structures, facilitating the verification of sequencing-related biological discoveries. This tool also includes features that introduce substitution, insertion, and deletion errors at various stages of library preparation, PCR amplification, and sequencing, enhancing its applicability in diverse experimental conditions and simulating real world conditions. Our results demonstrate that, upon removal of PCR duplicates, cell type-specific gene expression profiles derived from our simulated reads highly resemble reference data. We envisage that Tresor will provide valuable insights into a broader range of transcriptomics analyses and support the development of more effective algorithms for read alignment and UMI deduplication.

Article activity feed