Assessing large-scale genomic language models in predicting personal gene expression: promises and limitations
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large-scale genomic language models (gLMs) hold promise for modeling gene regulation, yet their ability to predict personal gene expression remains largely unexplored. We developed a framework, gLM2X-Tower, to benchmark gLMs and sequence-to-function (S2F) models on this task with paired personal genome-transcriptome data. With individual-level training, we found that similar to S2F models (e.g., AlphaGenome), gLMs (e.g., Evo2) remain incapable of predicting the inter-person variability on held-out genes. However, such training improves prediction for seen genes in new individuals, particularly by gLMs, highlighting the potential applications in few-shot settings like for rare variants