Evaluating the Utilities of Foundation Models in Single-cell Data Analysis

Tianyu Liu
Kexing Li
Yuge Wang
Hongyu Li
Hongyu Zhao

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Arcadia Science)

Abstract

Foundation Models (FMs) have made significant strides in both industrial and scientific domains. In this paper, we evaluate the performance of FMs for single-cell sequencing data analysis through comprehensive experiments across eight downstream tasks pertinent to single-cell data. Overall, the top FMs include scGPT, Geneformer, and CellPLM by considering model performances and user accessibility among ten single-cell FMs. However, by comparing these FMs with task-specific methods, we found that single-cell FMs may not consistently excel than task-specific methods in all tasks, which challenges the necessity of developing foundation models for single-cell analysis. In addition, we evaluated the effects of hyper-parameters, initial settings, and stability for training single-cell FMs based on a proposed scEval framework, and provide guidelines for pre-training and fine-tuning, to enhance the performances of single-cell FMs. Our work summarizes the current state of single-cell FMs, points to their constraints and avenues for future development, and offers a freely available evaluation pipeline to benchmark new models and improve method development.

Arcadia Science
Feb 29, 2024

scGPT v1 outperformed the scGPT model overall, raising the issue146of the need for increasing the size of pre-training datasets for this task

Wasn't scGPT v1 which out performed scGPT trained on a smaller pre-training data set?

Read the original source
Version published to 10.1101/2023.09.08.555192 on bioRxiv
Sep 8, 2023

Out-of-the-box bioinformatics capabilities of large language models (LLMs)

This article has 2 authors:
1. Varsha Rajesh
2. Geoffrey H. Siwo
This article has no evaluationsLatest version Aug 27, 2025
Unlocking biological insight from single-cell data with an interpretable dual-stream foundation model

This article has 8 authors:
1. Honglie Guo
2. Qinghang Cui
3. Xiang Zhang
4. Chaowei Chen
5. Weihua Zheng
6. Changfeng Cai
7. Xinyi Wang
8. Shunfang Wang
This article has no evaluationsLatest version Sep 11, 2025
Gene-Family Encoding Boosts Domain-Adapted Single-Cell Language Models

This article has 8 authors:
1. Haoran Ma
2. Chang Xu
3. Shamaine Wei Ting Ho
4. Joseph J Zhao
5. Yunqiang Chu
6. Angie Lay Keng Tan
7. Raghav Sundar
8. Patrick Tan
This article has no evaluationsLatest version Sep 19, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Out-of-the-box bioinformatics capabilities of large language models (LLMs)

Unlocking biological insight from single-cell data with an interpretable dual-stream foundation model

Gene-Family Encoding Boosts Domain-Adapted Single-Cell Language Models