scYeast: a Biological-knowledge-guided Foundation Model on Yeast Single-Cell Transcriptomics

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Though large-scale pre-trained models are vital for foundational cell modeling, most focus on human or mouse systems, neglecting model organisms like yeast ( Saccharomyces cerevisiae ) and often failing to use existing biological prior knowledge effectively. To address this, we present scYeast, the first foundational cell model tailored for yeast that seamlessly embeds biological priors. scYeast employs a novel asymmetric parallel architecture to infuse transcriptional regulatory information directly into the Transformer’s attention mechanism, leveraging established biological knowledge during training. Pre-trained on large-scale yeast single-cell transcriptomics data, scYeast demonstrates strong generalization and biological interpretability. It excels in zero-shot tasks, such as inferring regulatory relationships and identifying critical cell states. After fine-tuning, scYeast performs exceptionally across a diverse set of tasks from cell type classification to predicting growth doubling time and gene perturbation response. Additionally, using transfer learning, scYeast can be adapted to other omics datasets, such as proteomics, thus broadening its utility in big data analysis. Overall, scYeast is a powerful tool for yeast single-cell biology research and sets a new standard for integrating foundational models with biological prior knowledge, dramatically accelerating the pace of discovery in yeast synthetic and systems biology and providing a replicable framework for other organisms.

Article activity feed