How Do Large Language Models Understand Genes and Cells

Chen Fang
Yidong Wang
Yunze Song
Qingqing Long
Wang Lu
Linghui Chen
Guihai Feng
Yuanchun Zhou
Xin Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Researching genes and their interactions is crucial for deciphering the fundamental laws of cellular activity, advancing disease treatment, drug discovery, and more. Large language Models (LLMs), with their profound text comprehension and generation capabilities, have made significant strides across various natural science fields. However, their application in cell biology remains limited and a systematic evaluation of their performance is lacking. To address this gap, in this article, we select seven mainstream LLMs and evaluate their performance across nine gene-related problem scenarios. Our findings indicate that LLMs possess a certain level of understanding of genes and cells, but still lag behind domain-specific models in comprehending transcriptional expression profiles. Moreover, we have improved the current method of textual representation of cells, enhancing the LLMs’ ability to tackle cell annotation tasks. We encourage cell biology researchers to leverage LLMs for problem-solving while being mindful of the associated challenges. We release our code and data at https://github.com/epang-ucas/Evaluate_LLMs_to_Genes .

Version published to 10.1145/3702234
Nov 24, 2025
Version published to 10.1101/2024.03.23.586383 on bioRxiv
Mar 27, 2024

Understanding Pathways in Bioinformatics, Genomics, and Health Applications

This article has 1 author:
1. Diptarup Mallick
This article has no evaluationsLatest version Jan 19, 2026
LLMAgent4Bio: LLM Agents for Biological Intelligence Across Genomics, Proteomics, Spatial Biology, and Biomedicine

This article has 9 authors:
1. Sajib Acharjee Dip
2. Dipanwita Mallick
3. Uddip Acharjee Shuvo
4. Shovito Barua Soummo
5. Fazle Rafsani
6. Bikash Kumar Paul
7. Nazifa Ahmed Moumi
8. Shafayat Ahmed
9. Liqing Zhang
This article has no evaluationsLatest version Dec 16, 2025
A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Understanding Pathways in Bioinformatics, Genomics, and Health Applications

LLMAgent4Bio: LLM Agents for Biological Intelligence Across Genomics, Proteomics, Spatial Biology, and Biomedicine

A Survey on Efficient Protein Language Models