ProSynTaxDB: A curated protein database and workflow for taxonomic classification of Prochlorococcus and Synechococcus in metagenomes
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Prochlorococcus and Synechococcus are abundant marine picocyanobacteria that contribute significantly to ocean primary production. Recent genome sequencing efforts, including those presented here, have yielded a large number of high-quality reference genomes, enabling the classification of these picocyanobacteria in marine metagenomic sequence data at high phylogenetic resolution. When combined with environmental data, these classifications can guide cluster/clade/grade assignments and offer insights into niche differentiation within these populations. Here we present ProSynTaxDB, a curated protein sequence database and accompanying workflow aimed at enhancing the taxonomic resolution of Prochlorococcus and Synechococcus classification. ProSynTaxDB includes proteins from 1,260 genomes of Prochlorococcus and Synechococcus , including single-amplified genomes, high-quality draft genomes, and newly closed genomes. Additionally, ProSynTaxDB incorporates proteins from 27,799 genomes of marine heterotrophic bacteria, archaea, and viruses to assess microbial and viral communities surrounding Prochlorococcus and Synechococcus . This resource enables accurate classification of picocyanobacterial clusters/clades/grades in metagenomic data – even when present at 0.60% of reads for Prochlorococcus or 0.09% of reads for Synechococcus .