Assessing large-scale genomic language models in predicting personal gene expression: promises and limitations

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large-scale genomic language models (gLMs) hold promise for modeling gene regulation, yet their ability to predict personal gene expression remains largely unexplored. We developed a framework, gLM2X-Tower, to benchmark gLMs and sequence-to-function (S2F) models on this task with paired personal genome-transcriptome data. With individual-level training, we found that similar to S2F models (e.g., AlphaGenome), gLMs (e.g., Evo2) remain incapable of predicting the inter-person variability on held-out genes. However, such training improves prediction for seen genes in new individuals, particularly by gLMs, highlighting the potential applications in few-shot settings like for rare variants

Article activity feed