Alignment of RNA Secondary Structures with Arbitrary Pseudoknots using Structural Sequences

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Comparison of RNA secondary structures is fundamental for RNA classification, motifanalysis, and evolutionary studies. While efficient methods exist for pseudoknot-free structures, the comparison of RNAsecondary structures with arbitrary pseudoknots remains computationally challenging. Results We introduce a novel representation of RNA secondary structures with arbitrary pseudoknots based on integer sequences, called structural sequences. On this representation, we define the SERNA distance, an extension of the classical edit distance with structural correctness constraints, and prove that it is a metric. We present SERNAlign, an open-source tool that computes the SERNA distance using dynamic programming with quadratic time complexity. To evaluate the proposed distance, we conduct two complementary experiments within a clustering-based evaluation framework: classification of experimentally validated pseudoknot motifs, which directly targets the design goal of SERNA, and phylogenetic clustering of ribosomal RNAs as a robustness check against existing structural distances. Across both tasks, SERNA demonstrates competitive clustering performance with respect to state-of-the-art comparison methods, while providing improved discrimination in complex motif settings and significantly lower computational cost compared to structure-based approaches. Conclusions Structural sequences provide a precise and computationally efficient abstraction for RNA secondary structures with arbitrary pseudoknots. The associated SERNA distance captures global structural organization, enabling structure comparison and effective clustering of complex RNA secondary structures without relying on primary sequence information. By balancing representational power and computational efficiency, SERNA complements existing methods for RNA secondary structure comparison in pseudoknotted settings.

Article activity feed