Spatial-neighbour encoding enables fast RNA 3D structure search
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Experimental and predicted RNA three-dimensional structures are expanding rapidly, but RNA structure search still lacks a compact residue-level representation that supports database-scale comparison. Using family-held-out ablations across the currently available experimental RNA structure collection, we found that spatial-neighbour features are markedly more informative for family-level discrimination than conventional backbone and base descriptors. Building on this result, we developed RiboSeek, a search framework based on a 20-letter geometric alphabet (RS-20), an 80-letter structure-and-base composite alphabet (RS-80). Across family-level classification and retrieval benchmarks, RS-80 delivered the strongest overall performance, whereas RS-20 most closely tracked US-align TM-score, indicating better preservation of geometric similarity. RiboSeek searches the full experimental RNA structure database in 204 ms per query and can be applied to predicted RNA structure libraries to prioritize candidate structural relationships for downstream analysis.