Refinement of the “ Candidatus Accumulibacter” Genus Based on a Metagenomic Analysis of Biological Nutrient Removal (BNR) Pilot-Scale Plants Operated with Reduced Aeration

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Members of the “ Candidatus Accumulibacter” genus are widely studied as key polyphosphate-accumulating organisms (PAOs) in biological nutrient removal (BNR) facilities performing enhanced biological phosphorus removal (EBPR). This diverse lineage includes 18 “ Ca . Accumulibacter” species, which have been proposed based on the phylogenetic divergence of the polyphosphate kinase 1 ( ppk1 ) gene and genome-scale comparisons of metagenome-assembled genomes (MAGs). Phylogenetic classification based on the 16S rRNA genetic marker has been difficult to attain because most “ Ca . Accumulibacter” MAGs are incomplete and often do not include the rRNA operon. Here, we investigate the “ Ca . Accumulibacter” diversity in pilot-scale treatment trains performing BNR under low dissolved oxygen (DO) conditions using genome-resolved metagenomics. Using long-read sequencing, we recovered medium and high-quality MAGs for 5 of the 18 “ Ca . Accumulibacter” species, all with rRNA operons assembled, which allowed a reassessment of the 16S rRNA-based phylogeny of this genus and an analysis of phylogeny based on the 23S rRNA gene. In addition, we recovered a cluster of MAGs that based on 16S rRNA, 23S rRNA, ppk1 , and genome-scale phylogenetic analyses do not belong to any of the currently recognized “ Ca . Accumulibacter” species for which we propose the new species designation “ Ca . Accumulibacter jenkinsii” sp. nov. Relative abundance evaluations of the genus across all pilot plant operations revealed that regardless of the operational mode, “ Ca . A. necessarius” and “ Ca . A. propinquus” accounted for more than 40% of the “ Ca . Accumulibacter” community, whereas the newly proposed “ Ca . A. jenkinsii” accounted for about 5% of the “ Ca . Accumulibacter” community.

IMPORTANCE

One of the main drivers of energy use and operational costs in activated sludge processes is the amount of oxygen provided to enable biological phosphorus and nitrogen removal. Wastewater treatment facilities are increasingly considering reduced aeration to decrease energy consumption, and whereas successful BNR has been demonstrated in systems with minimal aeration, an adequate understanding of the microbial communities that facilitate nutrient removal under these conditions is still lacking. In this study, we used genome-resolved metagenomics to evaluate the diversity of the “ Candidatus Accumulibacter” genus in pilot-scale plants operating with minimal aeration. We identified the “ Ca. Accumulibacter” species enriched under these conditions, including one novel species for which we propose “ Ca. Accumulibacter jenkinsii” sp. nov. as its designation. Furthermore, the MAGs obtained for 5 additional “ Ca. Accumulibacter” species further refine the phylogeny of the “ Ca. Accumulibacter” genus and provide new insight into its diversity within unconventional biological nutrient removal systems.

Article activity feed

  1. The datasets can also be

    Right now I think only the metagenomes are available int he SRA and I don't see the genomes in Genbank...would be great to upload those there as well if possible!

  2. The program coverM (v0.6.1) (https://github.com/wwood/CoverM) was used to obtain the relative abundance of reads mapped onto each MAG with the “coverm genome” command.

    Were BAM files passed to coverM or was minimap2 used to map the reads? I think also I would do tests to see how complete your assemblies are by mapping back the reads to the assemblies (PacBio or Illumina only) and reporting those stats. If that's somewhere and I missed it sorry!

  3. The shortened PacBio reads and the cleaned short-reads from Illumina libraries were competitively mapped

    Hmm interesting, I might have instead done this mapping with only PacBio (with them fragmented) or only Illumina, since there could be "identical" reads in the two sets and this counts it as double abundance I think.

  4. Phylogenetic analyses and relative abundance

    I might have missed this - what were the methods for assessing quality (CheckM?) and identifying ribosomal genes? I'm guessing Prokka/infernal?

  5. All assembled contigs generated from both PacBio and Illuminia sequencing technology were binned using metaBAT2 (45).

    Also just to keep in mind if someone wants to follow up, there are now binning tools that are specific to long-read based assemblies, or sometimes even manual binning with mmgenome2 could work if the assemblies aren't too fragmented

  6. assembled using metaSPAdes (v3.15.2) (44), and mapped to the final assembly using BBMap (v38.86)

    I'm guessing this was a coassembly of the two Illumina samples?

  7. and assembled using metaFlye (v2.9) (40). PacBio assemblies were polished with racon (v1.4.13)

    I think with this generation of PacBio reads, even metagenomic ones, the recommended pipeline would be assembly with hifiasm (for which I think there's a metagenomic version as well) and then using those assembled reads for downstream processes, because HiFi reads are considered high enough quality to not need these polishing steps, and polishing steps can actually introduce indels accidentally, especially in repeat-rich areas like rRNAs, see here: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009802 but mostly applied to Nanopore in this case

  8. PacBio reads were filtered using BBtools (v38.87/38.88)

    Not sure I've seen filtering steps applied for PacBio reads prior to assembly - with what program in BBTools was used and why? For filtering out short reads?

  9. For long-read sequencing, libraries were prepared by shearing genomic DNA to either 3 kb or 6-10 kb,

    Ah ok I see the answer here to my question above, it was a little confusing before

  10. hat “Ca. A. necessarius and “Ca. A. propinquus” accounted for greater than 40% of the “Ca. Accumulibacter” assemblage,

    I know you're focused on Accumulibacter, but I'm curious 1) How much of the total relative abundance Accumulibacter makes up, to thus ask 2) Were there other lineages that were quite abundant other than Accumulibacter? Since you seem to have good HQ MAGs from Accumulibacter with PacBio I would expect that with a little more effort you would have some good flanking genomes, and most flanking genomes that are of good quality come from bioreactors or the Danish WWTP study, so it could be a good resource for the community

  11. high quality (HQ, greater than 90% completeness, less than 5% contamination) according to MiMAG standards (28). In addition, they have two fully assembled copies of the rRNA operon, which facilitates additional analysis of this novel cluster and the proposal for a new species epithet (see below).

    I think HQ based on MiMAG standards is above 90% completeness, below 5% redundancy, presence of all 3 rRNAs, and then at least 18 (I might be wrong on this number) tRNAs. Here do you mean by the two fully assembled copies of the rRNA operon mean there's two of each rRNA gene? Are they fragmented at all (just curious)?

  12. That is, UW14 belonged to “Ca. A. meliphilus,” UW15 to “Ca. A. delftensis,” UW16 and UW24 to “Ca. A. propinquus,” UW17 to “Ca. A. contiguus,” and UW19, UW28, and UW29 to “Ca. A. necessarius” (Table 2).

    Interesting how the genomes from these pilot scale plants aren't from the species we find enriched in the bioreactors if I remember the species names correctly - the IA and IIA genomes are usually what pops up in the bioreactors but not abundant here.

  13. Among the MAGs assembled from these metagenomes were 16 MAGs taxonomically classified, according to the GTDB-Tk Lineage classification, as belonging to the Accumulibacter lineage.

    This is also worded in a little confusing way

  14. 15 6-10kb PacBio, 7 3kb PacBio

    The way this is written is a little confusing - are there 22 long-read metagenomes in total where 15 of them had a read length range of 6-10kb and the other 7 3 kb?

  15. Under these cyclic anaerobic-aerobic conditions, net P removal from the bulk liquid is achieved

    This sentence almost seems like it fits better at the end of the last paragraph, and your topic sentence is the second sentence of this paragraph perhaps?