EvoRMD: Integrating Biological Context and Evolutionary RNA Language Models for Interpretable Prediction of RNA Modifications

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

RNA modifications are essential regulators of post-transcriptional gene expression, influencing RNA stability, localization, translation, and degradation. Determining the specific modification at a given nucleotide is therefore critical for understanding its regulatory role. Most computational approaches treat each modification type as an independent binary task. This strategy provides a macro-level statistical perspective, but it does not reflect that, under a defined biochemical or cellular condition, only one modification type can occur at a specific site. Current mapping assays also report a single observed modification per site, leaving all other types unlabeled rather than truly negative. These properties motivate a framework that can reason over competing modification types. We introduce EvoRMD , a unified model for biologically contextualized and interpretable prediction of RNA modification types. EvoRMD integrates contextual sequence embeddings from a large-scale RNA language model with structured biological metadata—including species, organ, cell type, and subcellular localization. A lightweight attention mechanism highlights informative sequence positions. A shared multi-class classifier then generates a context-conditioned plausibility distribution over eleven modification types (Am, Cm, Um, Gm, D, pseudouridine, m 1 A, m 5 C, m 5 U, m 6 A, m 7 G), consistent with the single-positive, multiple-unlabeled nature of existing datasets. Although trained in a multi-class setting, EvoRMD also produces calibrated multi-label predictions through sigmoid-transformed logits, enabling direct comparison with existing single-modification and multi-label methods. EvoRMD achieves strong predictive performance and offers interpretable insights through attention profiles and motif analyses. Together, these components establish a biologically grounded framework for identifying and prioritizing RNA modification types from sequence and context.

Article activity feed