Cross-Attention Over RNA And Protein Sequences Enables Generalizable Interaction Prediction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Computational predictions are essential to characterize the RNA–protein interaction landscape, yet a persistent gap between benchmark performance and practical utility suggests that current models have limited generalization capabilities. To address this issue, we present CORAL (Cross-attention for RNA-protein Association Learning), a deep learning framework for the prediction of RNA–protein interactions that integrates pretrained protein (ESM-2) and RNA (DNABERT2) language models through bidirectional cross-attention with Low-Rank Adaptation fine-tuning. We also introduce a benchmarking framework that rigorously addresses the problem of data redundancy between training and test sets, which greatly inflates model performances reported in the literature. To this end we adopt three partitioning strategies of increasing stringency: conventional random splits, pairwise non-redundant splits, and component-wise non-redundant splits. CORAL maintains an F1 score of 0.65 under the most stringent component-wise evaluation, compared to 0.55 for the next-best method. Interpretability analyses reveal that specific cross-attention heads systematically attend to structurally defined contact positions between RNA and protein molecules, showing 27% elevated attention at interface residues across 309 experimentally resolved complexes ( p < 0.01). These findings establish that current RPI prediction benchmarks substantially inflate performance estimates and demonstrate that cross-modal attention architectures yield improved generalization alongside mechanistically interpretable representations.