X-Atlas/Orion: Genome-wide Perturb-seq Datasets via a Scalable Fix-Cryopreserve Platform for Training Dose-Dependent Biological Foundation Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The rapid expansion of massively parallel sequencing technologies has enabled the development of foundation models to uncover novel biological findings. While these have the potential to significantly accelerate scientific discoveries by creating AI-driven virtual cell models, their progress has been greatly limited by the lack of large-scale high-quality perturbation data, which remains constrained due to scalability bottlenecks and assay variability. Here, we introduce “Fix-Cryopreserve-ScRNAseq” (FiCS) Perturb-seq, an industrialized platform for scalable Perturb-seq data generation. We demonstrate that FiCS Perturb-seq exhibits high sensitivity and low batch effects, effectively capturing perturbation-induced transcriptomic changes and recapitulating known biological pathways and protein complexes. In addition, we release X-Atlas: Orion edition (X-Atlas/Orion), the largest publicly available Perturb-seq atlas. This atlas, generated from two genome-wide FiCS Perturb-seq experiments targeting all human protein-coding genes, comprises eight million cells deeply sequenced to over 16,000 unique molecular identifiers (UMIs) per cell. Furthermore, we show that single guide RNA (sgRNA) abundance can serve as a proxy for gene knockdown (KD) efficacy. Leveraging the deep sequencing and substantial cell numbers per perturbation, we also show that stratification by sgRNA expression can reveal dose-dependent genetic effects. Taken together, we demonstrate that FiCS Perturb-seq is an efficient and scalable platform for high-throughput Perturb-seq screens. Through the release of X-Atlas/Orion, we highlight the potential of FiCS Perturb-seq to address current scalability and variability challenges in data generation, advance foundation model development that incorporates gene-dosage effects, and accelerate biological discoveries.