CAMITAX: Taxon labels for microbial genomes

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

The number of microbial genome sequences is increasing exponentially, especially thanks to recent advances in recovering complete or near-complete genomes from metagenomes and single cells. Assigning reliable taxon labels to genomes is key and often a prerequisite for downstream analyses.

Findings

We introduce CAMITAX, a scalable and reproducible workflow for the taxonomic labelling of microbial genomes recovered from isolates, single cells, and metagenomes. CAMITAX combines genome distance–, 16S ribosomal RNA gene–, and gene homology–based taxonomic assignments with phylogenetic placement. It uses Nextflow to orchestrate reference databases and software containers and thus combines ease of installation and use with computational reproducibility. We evaluated the method on several hundred metagenome-assembled genomes with high-quality taxonomic annotations from the TARA Oceans project, and we show that the ensemble classification method in CAMITAX improved on all individual methods across tested ranks.

Conclusions

While we initially developed CAMITAX to aid the Critical Assessment of Metagenome Interpretation (CAMI) initiative, it evolved into a comprehensive software package to reliably assign taxon labels to microbial genomes. CAMITAX is available under Apache License 2.0 at https://github.com/CAMI-challenge/CAMITAX.

Article activity feed

  1. Abstract

    **Reviewer 2. Bruno Fosso **

    The paper by Bremges et al. describe CAMITAX a workflow designed for the taxonomic classification of microbial genomes obtained from the application of NGS-based methodologies, such as single-cell sequencing and metagenomics. Even if the 4 implemented methodologies itself do not represent a real novelty in the field, their harmonization by using a classification algorithm is interesting. Moreover, the idea to deploy the workflow in a container greatly simplify both the installation and usage and ensure the analysis reproducibility.

    The manuscript is well written and easy to read. All the proposed figures are appropriate and adequately support the data described in the main text. Figure 2 may be improved by using different colors allowing to easily discriminate the paths through the plot.

    The CAMITAX GitHub repository clearly describe how to access and configure the container but very few information are available about the manual installation. The usage section needs an improvement.

    I have some minor concerns about the paper:

    • the classification algorithm needs to be described more in deep. A figure may help the readers;
    • regarding the overall drop of CAMITAX recall in mid-range ranks, I was wondering if it may be due to the fact that CAMITAX seems to be more conservative than the Delmont classification (figure 2). Authors should discuss in how many cases CAMITAX results more conservative than the reference classification.
    • Moreover, the authors claim that "Notably, 95% of CAMITAX's predictions were consistent with Delmont et al., i.e. the two assignments were on the same taxonomic lineage and their LCA is either of the two." Does it mean the authors consider consistent a classification for which CAMITAX assigns to the kingdom rank while Dermont assigns to species? Please clarify

    It would be useful to add some information about the technical requirements such as consumed RAM and required CPU time.

  2. Now published in GigaScience doi: 10.1093/gigascience/giz154

    Andreas Bremges 1Computational Biology of Infection Research, Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany2German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, 38124 Braunschweig, GermanyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Andreas BremgesFor correspondence: andreas.bremges@helmholtz-hzi.de alice.mchardy@helmholtz-hzi.deAdrian Fritz 1Computational Biology of Infection Research, Helmholtz Centre for Infection Research, 38124 Braunschweig, GermanyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteAlice C. Mchardy 1Computational Biology of Infection Research, Helmholtz Centre for Infection Research, 38124 Braunschweig, GermanyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteFor correspondence: andreas.bremges@helmholtz-hzi.de alice.mchardy@helmholtz-hzi.de

    A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giz154 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

    These peer reviews were as follows:

    Reviewer 1: http://dx.doi.org/10.5524/REVIEW.102049