gNOMO2: a comprehensive and modular pipeline for integrated multi-omics analyses of microbiomes

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

Over the past few years, the rise of omics technologies has offered an exceptional chance to gain a deeper insight into the structural and functional characteristics of microbial communities. As a result, there is a growing demand for user friendly, reproducible, and versatile bioinformatic tools that can effectively harness multi-omics data to offer a holistic understanding of microbiomes. Previously, we introduced gNOMO, a bioinformatic pipeline specifically tailored to analyze microbiome multi-omics data in an integrative manner. In response to the evolving demands within the microbiome field and the growing necessity for integrated multi-omics data analysis, we have implemented substantial enhancements to the gNOMO pipeline.

Results

Here, we present gNOMO2, a comprehensive and modular pipeline that can seamlessly manage various omics combinations, ranging from two to four distinct omics data types including 16S rRNA gene amplicon sequencing, metagenomics, metatranscriptomics, and metaproteomics. Furthermore, gNOMO2 features a specialized module for processing 16S rRNA gene amplicon sequencing data to create a protein database suitable for metaproteomics investigations. Moreover, it incorporates new differential abundance, integration and visualization approaches, all aimed at providing a more comprehensive toolkit and insightful analysis of microbiomes. The functionality of these new features is showcased through the use of four microbiome multi-omics datasets encompassing various ecosystems and omics combinations. gNOMO2 not only replicated most of the primary findings from these studies but also offered further valuable perspectives.

Conclusions

gNOMO2 enables the thorough integration of taxonomic and functional analyses in microbiome multi-omics data, opening up avenues for novel insights in the field of both host associated and free-living microbiome research. gNOMO2 is available freely at https://github.com/muzafferarikan/gNOMO2 .

Article activity feed

  1. Background Over the past few years, the rise of omics technologies has offered an exceptional chance to gain a deeper insight into the structural and functional characteristics of microbial communities. As a result, there is a growing demand for user friendly, reproducible, and versatile bioinformatic tools that can effectively harness multi-omics data to offer a holistic understanding of microbiomes. Previously, we introduced gNOMO, a bioinformatic pipeline specifically tailored to analyze microbiome multi-omics data in an integrative manner. In response to the evolving demands within the microbiome field and the growing necessity for integrated multi-omics data analysis, we have implemented substantial enhancements to the gNOMO pipeline.Results Here, we present gNOMO2, a comprehensive and modular pipeline that can seamlessly manage various omics combinations, ranging from two to four distinct omics data types including 16S rRNA gene amplicon sequencing, metagenomics, metatranscriptomics, and metaproteomics. Furthermore, gNOMO2 features a specialized module for processing 16S rRNA gene amplicon sequencing data to create a protein database suitable for metaproteomics investigations. Moreover, it incorporates new differential abundance, integration and visualization approaches, all aimed at providing a more comprehensive toolkit and insightful analysis of microbiomes. The functionality of these new features is showcased through the use of four microbiome multi-omics datasets encompassing various ecosystems and omics combinations. gNOMO2 not only replicated most of the primary findings from these studies but also offered further valuable perspectives.Conclusions gNOMO2 enables the thorough integration of taxonomic and functional analyses in microbiome multi-omics data, opening up avenues for novel insights in the field of both host associated and free-living microbiome research. gNOMO2 is available freely at https://github.com/muzafferarikan/gNOMO2.

    This work has been peer reviewed in GigaScience (see paper), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer name: Yuan Jiang (R1)

    The authors have fully addressed my comments.

  2. Background Over the past few years, the rise of omics technologies has offered an exceptional chance to gain a deeper insight into the structural and functional characteristics of microbial communities. As a result, there is a growing demand for user friendly, reproducible, and versatile bioinformatic tools that can effectively harness multi-omics data to offer a holistic understanding of microbiomes. Previously, we introduced gNOMO, a bioinformatic pipeline specifically tailored to analyze microbiome multi-omics data in an integrative manner. In response to the evolving demands within the microbiome field and the growing necessity for integrated multi-omics data analysis, we have implemented substantial enhancements to the gNOMO pipeline.Results Here, we present gNOMO2, a comprehensive and modular pipeline that can seamlessly manage various omics combinations, ranging from two to four distinct omics data types including 16S rRNA gene amplicon sequencing, metagenomics, metatranscriptomics, and metaproteomics. Furthermore, gNOMO2 features a specialized module for processing 16S rRNA gene amplicon sequencing data to create a protein database suitable for metaproteomics investigations. Moreover, it incorporates new differential abundance, integration and visualization approaches, all aimed at providing a more comprehensive toolkit and insightful analysis of microbiomes. The functionality of these new features is showcased through the use of four microbiome multi-omics datasets encompassing various ecosystems and omics combinations. gNOMO2 not only replicated most of the primary findings from these studies but also offered further valuable perspectives.Conclusions gNOMO2 enables the thorough integration of taxonomic and functional analyses in microbiome multi-omics data, opening up avenues for novel insights in the field of both host associated and free-living microbiome research. gNOMO2 is available freely at https://github.com/muzafferarikan/gNOMO2.

    This work has been peer reviewed in GigaScience (see paper), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer name: Yuan Jiang (original submission)

    Referee Report for "gNOMO2: a comprehensive and modular pipeline for integrated multi-omics analyses of microbiomes"

    This paper introduced gNOMO2, a new version of gNOMO, which is a bioinformatic pipeline for multiomic management and analysis of microbiomes. The authors claimed that gNOMO2 incorporates new differential abundance, integration, and visualization tools compared to gNOMO. However, these new features as well as the distinction between gNOMO2 and gNOMO has not been clearly presented in the paper. In addition, the Methods section is written as a pipeline of bioinformatic tools and it is not clear what these tools are used for unless one is familiar with all the bioinformatic tools.

    My major comments are as follows:

    1. Given the existing work on gNOMO, it is critical for the authors to distinguish gNOMO2 from gNOMO to show its novelty. In the Methods section, the authors presented the six modules of gNOMO2. Are these all new from gNOMO, or does gNOMO included some of these functions? A clearer presentation of gNOMO2 versus gNOMO is needed.
    2. The authors did not present the methods in each module very well. For example, the authors wrote in Module 2 that "MaAsLin2 [31] is employed to determine differentially abundant taxa based on both AS and MP data. Furthermore, a joint visualization of MP and AS results is performed using the combi R package [32]. The final outputs include AS and MP based abundance tables, results from differential abundance analysis, and joint visualization analysis results." Without reading the references 31 and 32, it is very hard to understand what this module is really doing.
    3. The authors used the term "integrated multi-omics analysis" in all six modules of gNOMO2. It is not clear how this terms really means. It reads like that it is not really integrated analysis, instead, it is more like a module that can handle different types of data separately, such as differential abundance analysis for each type. What other integration has been used except joint visualization? What new integration tools have been incorporated in gNOMO2?
    4. In the differential abundance analysis, does the pipeline consider the features of microbiome data, such as their count, sparsity, and compositional features? Can the modules incorporate covariates in their differential abundance analysis? It is quite useful to have covariates adjusted in a differential abundance analysis?
    5. In the Analyses section, the authors applied gNOMO2 to re-analyze samples from previously published studies. They found some discrepancy between their results and the ones in the literature. Although some discrepancy is normal, the authors need to explain better what causes the discrepancy and whether it could yield different biological conclusions.
  3. Background Over the past few years, the rise of omics technologies has offered an exceptional chance to gain a deeper insight into the structural and functional characteristics of microbial communities. As a result, there is a growing demand for user friendly, reproducible, and versatile bioinformatic tools that can effectively harness multi-omics data to offer a holistic understanding of microbiomes. Previously, we introduced gNOMO, a bioinformatic pipeline specifically tailored to analyze microbiome multi-omics data in an integrative manner. In response to the evolving demands within the microbiome field and the growing necessity for integrated multi-omics data analysis, we have implemented substantial enhancements to the gNOMO pipeline.Results Here, we present gNOMO2, a comprehensive and modular pipeline that can seamlessly manage various omics combinations, ranging from two to four distinct omics data types including 16S rRNA gene amplicon sequencing, metagenomics, metatranscriptomics, and metaproteomics. Furthermore, gNOMO2 features a specialized module for processing 16S rRNA gene amplicon sequencing data to create a protein database suitable for metaproteomics investigations. Moreover, it incorporates new differential abundance, integration and visualization approaches, all aimed at providing a more comprehensive toolkit and insightful analysis of microbiomes. The functionality of these new features is showcased through the use of four microbiome multi-omics datasets encompassing various ecosystems and omics combinations. gNOMO2 not only replicated most of the primary findings from these studies but also offered further valuable perspectives.Conclusions gNOMO2 enables the thorough integration of taxonomic and functional analyses in microbiome multi-omics data, opening up avenues for novel insights in the field of both host associated and free-living microbiome research. gNOMO2 is available freely at https://github.com/muzafferarikan/gNOMO2.

    This work has been peer reviewed in GigaScience (see paper), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer name: Alexander Bartholomaus (original submission)

    Summary: "gNOMO2: a comprehensive and modular pipeline for integrated multi-omics analyses of microbiomes" by Arıkan and Muth presents a multi-omics tools for analysis of prokaryotes. It is an evolution of the first version and offers various separate modules, taking different type of input data. They present different example analysis based on already published data and reproduced the results. The manuscript is very well written (I could not detect a single typo) and it was fun to read! Well done! I have only very few comments and suggestions, see below. However, I had a problem executing the code.

    Key questions to answer:

    1. Are the methods appropriate to the aims of the study, are they well described, and are necessary controls included? Yes
    2. Are the conclusions adequately supported by the data shown? Yes
    3. Please indicate the quality of language in the manuscript. Does it require a heavy editing for language and clarity? Very well written!
    4. Are you able to assess all statistics in the manuscript, including the appropriateness of statistical tests used? No direct statistics given in the manuscript. Maybe the authors could include some example output as .zip file for interested potential users.

    Detailed comments to the manuscript: Line 168: What does "cleaned and redundancies are removed" mean? Are only identical genomes removed? Or are genome part that are identical (I guess this barely exists, except for conserved gene parts as 16S, or similar) removed? Or are only redundant genes removed? How is redundancy defined, 99% identical stretch? Line 399-405: When looking at figure 5A I am wondering how Fluviicoccus and Methanosarcina in the MP faction appear relatively abundant in some samples. Where they de novo assembled in the MG or MT modules? General comment figures: I know that it is a hack to deal with automatic figure generation and especially the axis labels (as names have very different length). However, I think some figures might be hardly visable in the printed version, especially axes label for panel B are very small. Maybe you can put the critical figures separately in the supplement, e.g. each B panel a one page.

    Suggestions: As suggest above, maybe the authors could include some example output (a simple example) as .zip file for interested potential users. This would give an idea of how the output looks like and what to expect besides the plots. But differential abundance tables might be more important than the plots, as the user would generate their own plot for later publications.

    Github and software: I also tested the software and followed the instructions in the Github. I successfully executed the "Requirements" and "Config" steps (including create of metadata file and copying of amplicon reads) and tried to execute Modul1.

    However, the following error occurred (using up-to-date conda and snakemake on Ubuntu linux): (snakemake) abartho@gmbs17:~/review_papers/GigaScience/gNOMO2$ snakemake -v 6.15.5 (snakemake) abartho@gmbs17:~/review_papers/GigaScience/gNOMO2$ snakemake -s workflow/Snakefile --cores 20 SyntaxError in line 9 of /home/abartho/miniconda3/envs/snakemake/lib/python3.6/sitepackages/smart_open/s3.py: future feature annotations is not defined (s3.py, line 9) File "/home/abartho/miniconda3/envs/snakemake/lib/python3.6/sitepackages/smart_open/init.py", line 34, in File "/home/abartho/miniconda3/envs/snakemake/lib/python3.6/sitepackages/smart_open/smart_open_lib.py", line 35, in File "/home/abartho/miniconda3/envs/snakemake/lib/python3.6/sitepackages/smart_open/doctools.py", line 21, in File "/home/abartho/miniconda3/envs/snakemake/lib/python3.6/sitepackages/smart_open/transport.py", line 104, in File "/home/abartho/miniconda3/envs/snakemake/lib/python3.6/sitepackages/smart_open/transport.py", line 49, in register_transport File "/home/abartho/miniconda3/envs/snakemake/lib/python3.6/importlib/init.py", line 126, in import_module In addition to solving the problem, an example metadata file and some explanation about the output (which I did not see yet) would be good for less experienced users.