16S-23S-5S rRNA Database: a comprehensive integrated database of archaeal and bacterial rRNA sequences, alignments, intragenomic heterogeneity, and secondary structures

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The 16S-23S-5S ribosomal RNA (rRNA) gene operon is a cornerstone of ribosomal architecture and function, making it an indispensable molecular marker for microbial taxonomy and phylogenetics. Despite the extensive development of 16S rRNA gene databases, a significant void persists in comprehensive resources that integrate all three rRNA genes (16S, 23S, and 5S), their precise alignments, intragenomic heterogeneity, and intricate structural details. To address this critical research gap, we introduce the 16S-23S-5S rRNA Database, a meticulously curated and intuitive platform encompassing around 5000 complete bacterial and archaeal genomes. This database provides granular operon-level annotations, robust alignment data, quantitative copy number statistics, percent identity matrices, and predicted secondary structures for each rRNA component. Our rigorous methodology integrates covariance models from RFAM with BLAST-based analyses for sequence comparisons, Clustal Omega for multiple sequence alignment, and Circos for genomic visualization, thereby elucidating the genomic architecture, sequence diversity, and structural conservation of rRNA genes. Crucially, the database systematically accounts for intragenomic heterogeneity, a factor paramount for accurate microbial classification and the inference of evolutionary trajectories. Our analysis reveals prevalent phyla such as Pseudomonadota (1,891 genomes), Actinomycetota (846 genomes), and Bacillota (773 genomes), with Bacillota exhibiting the highest average rRNA gene copy numbers (e.g., 16S = 6.64 ± 3.3, 23S = 6.63 ± 3.3, and 5S = 6.8 ± 3.3). Across all bacterial genomes, the 16S, 23S, and 5S rRNA genes exhibited average copy numbers of 4.25 ± 2.88, 4.24 ± 2.89, and 4.44 ± 3.08, respectively. The responsive web interface, engineered with HTML5, JavaScript, and Flask, incorporates user feedback mechanisms to enhance usability. By integrating these comprehensive elements, the 16S-23S-5S rRNA Database not only fills a substantial lacuna in the field of rRNA data analysis but also establishes a foundational resource for large-scale investigations in microbial systematics, genome evolution, and ribosomal biology, benefiting taxonomists, microbial genomics, and microbiome researchers alike.

Availability and implementation: The 16S-23S-5S rRNA Database is available as an open-access database at https://project.iith.ac.in/sharmaglab/rrnadatabase/ .

Importance

Understanding the diversity of microbes is essential for research in health, environment, and biotechnology. Ribosomal RNA (rRNA) genes, especially the 16S gene, are commonly used to classify microbes, but studies often ignore the full set of rRNA genes (16S, 23S, and 5S) together. Our work bridges this gap by creating the first comprehensive database that includes all three rRNA genes from around 5,000 microbial genomes. It offers detailed annotations, sequence alignments, intragenomic heterogeneity, and structural insights, helping scientists more accurately classify microbes and trace their evolution. This resource is designed to be easy to use and will support a wide range of microbiological research.

Article activity feed