Benchmarking single-cell foundation models for real-world RNA-seq data integration

Siyu Han
Tamas Sztanka-Toth
Enes Senel
Ahmed Elnaggar
Jaymala Patel
Tommaso Mansi
Denis Smirnov
Joel Greshock
Alex Javidi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Single-cell foundation models enable reusable representations and streamlined analysis workflows, yet rigorous evaluation of their performance and robustness in real-world pharmaceutical settings remain underexplored. Here, we benchmarked leading single-cell foundation models (scGPT; scGPT_CP, a continually pretrained checkpoint of scGPT; scFoundation; scMulan; CellFM) against established baseline methods (scVI; Harmony) for data integration using over 1.5 million cells from clinical and preclinical samples. Performance was assessed using well-established and complementary metrics for technical correction and biological structure preservation. We further introduced robustness-oriented rankings to summarize metric trade-offs and quantify performance consistency across datasets and evaluation settings. Our findings show that fine-tuning improved technical correction performance; among the foundation models, fine-tuned scGPT_CP performed best. However, the baseline scVI was the top overall performer, ranking first by our multi-metric Leximax ranking and achieving the highest Pareto Front-1 hit. Collectively, our study provides practical insights for adapting foundation models to real-world drug design and development.

Version published to 10.64898/2026.04.17.719314 on bioRxiv
Apr 21, 2026

CellBench-LS: Benchmark Evaluation of Single-cell Foundation Models for Low-supervision Scenarios

This article has 5 authors:
1. Yongjie Xu
2. Yiyun Li
3. Yue Yuan
4. Chang Yu
5. Zelin Zang
This article has no evaluationsLatest version Apr 5, 2026
A Systematic Evaluation of Single-Cell Batch Integration Metrics and sBEE: A Robust New Metric

This article has 4 authors:
1. Mekan Myradov
2. Aissa Houdjedj
3. Oznur Tastan
4. Hilal Kazan
This article has no evaluationsLatest version Apr 24, 2026
A transcriptomics-native foundation model for universal cell representation and virtual cell synthesis

This article has 2 authors:
1. Xiaohui Jiang
2. Jichun Xie
This article has no evaluationsLatest version Apr 14, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

CellBench-LS: Benchmark Evaluation of Single-cell Foundation Models for Low-supervision Scenarios

A Systematic Evaluation of Single-Cell Batch Integration Metrics and sBEE: A Robust New Metric

A transcriptomics-native foundation model for universal cell representation and virtual cell synthesis