Benchmarking Static Gene Regulatory Network Reconstruction and Dynamic Transition Probing in Single-Cell Foundation Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Single-cell foundation models may encode gene regulatory information, but it remains unclear which model components capture this signal and how it compares with conventional inference methods. Here, we introduce a unified benchmark that evaluates gene regulatory network (GRN) reconstruction from six single-cell foundation models and three classical baselines across six datasets and four reference network types. We disentangle three sources of regulatory signal within each model—pretrained token embeddings, final-layer hidden states, and attention-derived scores. Under a strict zero-shot setting, scGPT token-embedding similarity outperforms classical baselines on STRING and ChIP-seq references, recovers core transcription factors, and best preserves reference network topology. Moreover, static GRNs cannot test whether learned gene–gene relationships are predictive of expression dynamics, we therefore introduce dynamic transition probing, which iteratively applies a model’s reconstruction head to drive early-cell profiles toward late-cell states without temporal supervision. We find pretrained models capture meaningful developmental transitions, with scFoundation showing the strongest overall performance. Together, our results show that single-cell foundation models encode transferable regulatory and dynamical priors, but how well these priors can be recovered depends on model architecture, pretraining design, and extraction strategy.

Article activity feed