RNA3DClust: segmentation of RNA three-dimensional structures using a clustering-based approach
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
A growing body of evidence shows that the biological activity of RNA molecules is not only due to their primary and secondary structures, but also to their spatial conformation. As the experimental determination of the three-dimensional (3D) structure is a costly and uncertain process, the development of computational methods for predicting RNA fold is a necessity. A critical task for 3D structure prediction consists of finding substructures that can be predicted independently, before being assembled into a global fold. In protein structures, these subparts are the “structural domains” and, to the best of our knowledge, no equivalent concept has ever been defined for RNA macromolecules.
In this work, we present RNA3DClust, an adaptation of the Mean Shift clustering algorithm to the RNA 3D structure partitioning problem. This approach allowed us to delimit compact and separate regions in RNA conformations, analogously to the seminal definition of domains in proteins. Developing RNA3DClust required us to create a dataset of RNA 3D domain annotations, as well as a new segmentation quality score, which we both used for evaluating our method. In addition to macromolecular geometry, we also show that the RNA domain decompositions produced by RNA3DClust are relevant regarding data about RNA biological function reported in the literature. Finally, the emerging interest in long non-coding RNAs (lncRNAs) and their likeliness of containing locally folded regions has motivated us to generate an additional reference dataset of lncRNA predicted conformations. The resulting delineations of 3D domains by RNA3DClust illustrate the potential of our method for analyzing lncRNAs. Source code and datasets are freely available for download on the EvryRNA platform at: https://evryrna.ibisc.univ-evry.fr .