The Ribosomal Operon Database (ROD): A full-length rDNA operon database derived from genome assemblies

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Current rDNA reference sequence databases are tailored towards shorter DNA markers, such as parts of the 16/18S marker or the ITS region. However, due to advances in long-read DNA sequencing technologies, longer stretches of the rDNA operon are increasingly used in environmental sequencing studies to increase the phylogenetic resolution. There is, therefore, a growing need for longer rDNA reference sequences. Here, we present the Ribosomal Operon Database (ROD), which includes eukaryotic full-length rDNA operons fished from publicly available genome assemblies. Full-length operons were detected in 34.1% of the 34,701 examined eukaryotic genome assemblies from NCBI. In most cases (53.1%), more than one operon variant was detected, which can be due to intragenomic operon copy variability, allelic variation in non-haploid genomes, or technical errors from the sequencing and assembly process. The highest copy number found was 5,947 in Zea mays . In total, 453,697 unique operons were detected, with 69,480 operon variant clusters remaining after intragenomic clustering at 99% sequence identity. The operon length varied extensively across eukaryotes, ranging from 4,136 to 16,463 bp, which will lead to considerable PCR bias during amplification of the entire operon. Clustering the full-length operons revealed that the different parts (i.e., 18S, 28S, the hypervariable region V4 of 18S, and ITS) provide divergent taxonomic resolution, with 18S and the V4 region being the most conserved. The Ribosomal Operon Database (ROD) will be updated regularly to provide an increasing number of full-length rDNA operons to the scientific community.

Article activity feed