CoBRA: Compound Binding Site Prediction using RNA Language Model
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Ribonucleic acid (RNA) molecules perform a variety of functions within cells and thus implicated in various human diseases such as cancer. The fact that the proteins constitute a small portion of mRNAs and the ability of RNA to form highly specific three-dimensional binding pockets for small molecules have generated considerable interest in developing therapeutic agents that target RNAs as an alternative target for small-molecule drugs. Thus, like proteins, precise prediction of small-molecule binding sites across different classes of RNA targets could act as an starting point for drug discovery.
In this study, we present a lightweight deep learning framework called Compound Binding Site Prediction for Ribonucleic Acid (CoBRA). This framework can predict ligand-binding nucleotides without requiring any explicit structural information. Our approach uses residue-level embeddings obtained from a pre-trained RNA language model. These embeddings encapsulate the contextual and statistical properties of each nucleotide and are used in a frozen state as the input for a multi-layer perceptron classifier performing binary classification of binding versus non-binding residues. The model was trained using combinations of ten distinct RNA language models and six different loss functions, using the TR60 and HARIBOSS datasets, and tested on four independent benchmark sets (RB9, JL10, TL12, and TE18). The performance of CoBRA demonstrates a relative improvement of 22.1% in Matthew correlation coefficient and an increase in sensitivity of 45.6% compared to existing state-of-the-art RNA– ligand binding site prediction methods with structural information. These results show that sequence-based language model embeddings alone, which do not require any explicit coordinate or distance information, can match or outperform structure-based methods. This makes it a flexible tool for predicting binding sites across diverse RNA targets. CoBRA is available at https://github.com/WonkyeongJang/CoBRA .