Benchmarking Reveals the Superiority of Nucleic Acid Foundation Models in Predicting lncRNA Coding Potential

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: A subset of long noncoding RNAs (lncRNAs) contains short open reading frames and can encode functional micropeptides. However, identifying these coding lncRNAs (codlncRNAs) remains challenging due to weak coding signals, short peptide products, and heterogeneous evidence across databases. Existing computational tools lack unified benchmarks, and the utility of nucleic acid foundation models for this task remains unclear. Results: We construct the first multi-species, evidence-stratified benchmark for codlncRNA prediction and systematically characterized codlncRNAs across molecular dimensions. CodlncRNAs consistently exhibited transitional features between mRNAs and untranslated lncRNAs in sequence, structural, and physicochemical properties. Using this benchmark, we evaluated 12 classical tools and 4 foundation models. Classical methods showed limited zero-shot performance, whereas RNA-FM, RiNALMo, and DNABERT-2 achieved substantial gains after fine-tuning and demonstrated stronger cross-species generalization. Notably, DNABERT-2, trained solely on DNA, performed competitively or even superior to RNA-specific models. An ensemble framework integrating foundation and classical models further improved robustness and has been deployed as an accessible web server. Conclusions: Our study establishes the first benchmark for codlncRNA prediction, delineates their distinctive transitional molecular profile, and demonstrates the effectiveness of nucleic acid foundation models and cross-species inference. Moreover, the proposed framework provides a practical, scalable computational foundation for micropeptide discovery and RNA functional characterization.

Article activity feed