Semi-supervised segmentation of RNA 3D structures using density-based clustering

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

A growing body of evidence shows that the biological activity of RNA molecules is not only due to their primary and secondary structures, but also to their spatial conformation. This is analogous to proteins, where investigating function, folding, or evolution often requires dividing the three-dimensional (3D) structure into subparts that can be studied individually. These independent substructures, known as protein “3D domains”, are geometrically defined as compact and spatially separate regions of the polypeptide chain. In RNA macromolecules, however, and to the best of our knowledge, no equivalent 3D-based concept has yet been formulated.

We present RNA3DClust, an application of the Mean Shift clustering algorithm to the RNA 3D structure partitioning problem. For this work, a dedicated post-clustering procedure was developed to address the peculiarities of delimiting 3D domains in RNA conformations. Tuning and benchmarking RNA3DClust required us to create reference datasets of RNA 3D domain annotations and to devise a new scoring function—the Chain Segment Distance (CSD)—for assessing segmentation quality. Importantly, we show that the domain decompositions produced by RNA3DClust are consistent with those based on RNA biological function and evolution. Finally, the emerging interest in long non-coding RNAs (lncRNAs) and their likeliness of containing folded regions has motivated us to generate an additional reference dataset of lncRNA predicted conformations. The resulting delineations of 3D domains by RNA3DClust illustrate the potential of our method for analyzing lncRNA 3D structures. Source code and datasets are freely available for download on the EvryRNA platform at: https://evryrna.ibisc.univ-evry.fr .

GRAPHICAL ABSTRACT

Article activity feed