Benchmarking Pre-trained Genomic Language Models for RNA Sequence-Related Predictive Applications

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

RNA plays a pivotal role in diverse cellular functions across organisms. Developing computational algorithms for RNA sequence related questions is highly valuable. Recently, genomic language models (gLMs) with pre-training have emerged, offering flexibility for various downstream prediction tasks. However, comprehensive and fair evaluations of gLMs are lacking. In this study, we benchmark eight gLMs on prediction tasks covering four RNA processes, highlighting their strengths and limitations. While gLMs excel in performance overall, the larger model is not always better. Interestingly, models that integrate biological information consistently perform well in related tasks. Notably, gLMs demonstrate superior performance with limited training data, whereas task-specific methods achieve comparable performance with better computational efficiency when sufficient training data is available. Finally, we provide recommendations for model selection in different scenarios. These evaluation results underscore the potential of gLMs and suggest areas for future improvement.

Article activity feed