Benchmarking a foundational cell model for post-perturbation RNAseq prediction

Gerold Csendes
Kristóf Z. Szalay
Bence Szalai

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Accurately predicting cellular responses to perturbations is essential for understanding cell behaviour in both healthy and diseased states. While perturbation data is ideal for building such predictive models, it is considerably sparser than baseline (non-perturbed) cellular data. To address this limitation, several foundational cell models have been developed using large-scale single-cell gene expression data. These models are fine-tuned after pre-training for specific tasks, such as predicting post-perturbation gene expression profiles, and are considered state-of-the-art for these problems. However, proper benchmarking of these models remains an unsolved challenge.

In this study, we benchmarked a recently published foundational model, scGPT, against baseline models. Surprisingly, we found that even the simplest baseline model - taking the mean of training examples - outperformed scGPT. Furthermore, machine learning models that incorporate biologically meaningful features outperformed scGPT by a large margin. Additionally, we identified that the current Perturb-Seq benchmark datasets exhibit low perturbation-specific variance, making them suboptimal for evaluating such models.

Our results highlight important limitations in current benchmarking approaches and provide insights into more effectively evaluating post-perturbation gene expression prediction models.

Version published to 10.1101/2024.09.30.615843 on bioRxiv
Oct 1, 2024

Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

This article has 2 authors:
1. Xiuwei Zhang
2. Yuqi Cheng
This article has no evaluationsLatest version Dec 10, 2025
What Drives GNN Performance in Tissue Dynamics? Insights from Vertex-Model Simulations

This article has 7 authors:
1. Matej Krajnc
2. Troy Comi
3. Siqi Miao
4. Adnan Hafeez
5. Hadar Serviansky
6. Pan Li
7. Tomer Stern
This article has no evaluationsLatest version Jan 20, 2026
Attention-Based Hierarchical Graph Autoencoder for Dose-Specific Single-Cell Resistance Dynamics

This article has 3 authors:
1. Sachit Satyal
2. Teng Long
3. Jean Gao
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

What Drives GNN Performance in Tissue Dynamics? Insights from Vertex-Model Simulations

Attention-Based Hierarchical Graph Autoencoder for Dose-Specific Single-Cell Resistance Dynamics