scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis

Hongyu Zhao
Tianyu Liu
Tianqi Chen
Wangjie Zheng
Xiao Luo

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Various Foundation Models (FMs) have been built based on the pre-training and fine-tuning framework to analyze single-cell data with different degrees of success. In this manuscript, we propose a method named scELMo (Single-cell Embedding from Language Models), to analyze single cell data that utilizes Large Language Models (LLMs) as a generator for both the description of metadata information and the embeddings for such descriptions. We combine the embeddings from LLMs with the raw data under the zero-shot learning framework to further extend its function by using the fine-tuning framework to handle different tasks. We demonstrate that scELMo is capable of cell clustering, batch effect correction, and cell-type annotation without training a new model. Moreover, the fine-tuning framework of scELMo can help with more challenging tasks including in-silico treatment analysis or modeling perturbation. scELMo has a lighter structure and lower requirement for resources. Moreover, our method is comparable to recent large-scale FMs (such as scGPT, [1] Geneformer [2]) based on our evaluations, suggesting a promising path for developing domain-specific FMs.

Version published to 10.21203/rs.3.rs-4014784/v1 on Research Square
Mar 27, 2024

Gene-language models are whole genome representation learners

This article has 2 authors:
1. Bryan Naidenov
2. Charles Chen
This article has no evaluationsLatest version Mar 19, 2024
How do Large Language Models understand Genes and Cells

This article has 10 authors:
1. Chen Fang
2. Yidong Wang
3. Yunze Song
4. Qingqing Long
5. Wang Lu
6. Linghui Chen
7. Pengfei Wang
8. Guihai Feng
9. Yuanchun Zhou
10. Xin Li
This article has no evaluationsLatest version Mar 27, 2024
Learning interpretable cellular embedding for inferring biological mechanisms underlying single-cell transcriptomics

This article has 4 authors:
1. Hsieh Kang-Lin
2. Chu Yan
3. Patrick G. Pilié
4. Kai Zhang
This article has no evaluationsLatest version Apr 1, 2024

Listed in

Abstract

Article activity feed

Related articles

Gene-language models are whole genome representation learners

How do Large Language Models understand Genes and Cells

Learning interpretable cellular embedding for inferring biological mechanisms underlying single-cell transcriptomics