Assessing large-scale genomic language models in predicting personal gene expression: promises and limitations

Shumin Li
Ruibang Luo
Yuanhua Huang

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large-scale genomic language models (gLMs) hold promise for modeling gene regulation, yet their ability to predict personal gene expression remains largely unexplored. We developed a framework, gLM2X-Tower, to benchmark gLMs and sequence-to-function (S2F) models on this task with paired personal genome-transcriptome data. With individual-level training, we found that similar to S2F models (e.g., AlphaGenome), gLMs (e.g., Evo2) remain incapable of predicting the inter-person variability on held-out genes. However, such training improves prediction for seen genes in new individuals, particularly by gLMs, highlighting the potential applications in few-shot settings like for rare variants

Version published to 10.1101/2025.07.09.664024v1 on bioRxiv
Jul 14, 2025

GeneChat: A Multi-Modal Large Language Model for Gene Function Prediction

This article has 3 authors:
1. Shashi Dhanasekar
2. Akash Saranathan
3. Pengtao Xie
This article has no evaluationsLatest version Jun 6, 2025
From Likelihood to Fitness: Improving Variant Effect Prediction in Protein and Genome Language Models

This article has 4 authors:
1. Charles W. J. Pugh
2. Paulina G. Nuñez-Valencia
3. Mafalda Dias
4. Jonathan Frazer
Reviewed by Arcadia Science

This article has 4 evaluationsAppears in 1 listLatest version May 24, 2025Latest activity Jun 6, 2025
A large-scale foundation model for bulk transcriptomes

This article has 5 authors:
1. Boming Kang
2. Rui Fan
3. Meizheng Yi
4. Chunmei Cui
5. Qinghua Cui
This article has no evaluationsLatest version Jun 17, 2025

Listed in

Abstract

Article activity feed

Related articles

GeneChat: A Multi-Modal Large Language Model for Gene Function Prediction

From Likelihood to Fitness: Improving Variant Effect Prediction in Protein and Genome Language Models

A large-scale foundation model for bulk transcriptomes