SPLASH-structure: a statistical approach to identify RNA secondary structures from raw sequencing data, bypassing multiple sequence alignment
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
RNA secondary and tertiary structure is critically involved in ribozyme and ribosomal rRNA function, as well as viral and cellular regulation. Traditional experimental methods for RNA structure determination such as X-ray crystallography or chemical mapping are incisive; however, these approaches suffer from low-throughput and low-dimensionality, respectively. Computational approaches, leveraging evolutionary signals from correlated positions’ mutations, provide an alternative means to infer RNA structures. However, these methods require assembly, and face challenges due to statistical biases inherent in multiple sequence alignment (MSA). Furthermore, these methods cannot make use of the full spectrum of natural variations seen for a given RNA element. Here, we introduce SPLASH-structure, a direct assembly-free, MSA-free, and metadata-free statistical method for identifying conserved RNA structures by analyzing raw sequencing data, quantifying compensatory mutations or stem variation exclusion in the putative RNA structures. We show SPLASH-structure rediscovers known HIV structural elements and identifies conserved rRNA structures in metatranscriptomics samples. Moreover, SPLASH-structure finds Culex narnavirus 1, Gordis virus, and Culex mosquito virus 4, as well as previously unannotated viral genomes in mosquito metatranscriptomics samples de novo , highlighting the method’s potential for viral discovery. SPLASH-structure is an ultra-fast, easy to use, and robust tool that excels in high-throughput RNA structure prediction and hypothesis generation, presenting a novel approach for discovering structural RNA elements.