Accurate highly variable gene selection using RECODE in transcriptome data analysis

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Recent transcriptomics technologies enable gene-expression profiling at single-cell or micrometer-scale spatial resolution, but capture only a small fraction of true RNA molecules, introducing substantial technical noise driven by random sampling. These noise effects distort the earliest analytical steps, dimensionality reduction or highly variable gene (HVG) selection, and their consequences propagate into downstream analyses. The central aim of this study is to address this issue fundamentally by appropriately removing technical noise at its source. Here, I demonstrate that HVG selection based on RECODE, a de-noising method grounded in high-dimensional statistical theory, outperforms widely used approaches for both scRNA-seq and spatial transcriptomics data. RECODE-based HVG selection achieves higher accuracy and robustness, avoids missing values, improves down-stream performance, and provides the fastest runtime and best scalability among noise-reduction methods. These findings show that theory-driven noise removal is essential for recovering true biological signals and establish RECODE as a practical and reliable preprocessing strategy for single-cell analysis.

Article activity feed