MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle

This article has been Reviewed by the following groups

Read the full article

Abstract

The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large “omic” datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H΄), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa.

Article activity feed

  1. Now published in GigaScience doi: 10.1093/gigascience/gix096

    Valerie De Anda 1Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, 70-275, Coyoacán 04510 México D.F.Find this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Valerie De AndaFor correspondence: valdeanda@ciencias.unam.mx bcontreras@eead.csic.es souza@unam.mxIcoquih Zapata-Peñasco 2Dirección de Investigación en Transformación de Hidrocarburos. Instituto Mexicano del Petróleo, Eje Central Lázaro Cárdenas, Norte 152, Col. San Bartolo Atepehuacan, 07730, MéxicoFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteAugusto Cesar Poot-Hernandez 3Departamento de Ingeniería de Sistemas Computacionales y Automatización. Sección de Ingeniería de Sistemas Computacionales. Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas.Find this author on Google ScholarFind this author on PubMedSearch for this author on this siteLuis E. Eguiarte 1Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, 70-275, Coyoacán 04510 México D.F.Find this author on Google ScholarFind this author on PubMedSearch for this author on this siteBruno Contreras-Moreira 4Estación Experimental de Aula Dei, Consejo Superior de Investigaciones Científicas (EEAD-CSIC), Avda. Montañana, 1005, Zaragoza 50059, Spain5Fundación ARAID, calle María de Luna 11, 50018 Zaragoza, SpainFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteFor correspondence: valdeanda@ciencias.unam.mx bcontreras@eead.csic.es souza@unam.mxValeria Souza 1Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, 70-275, Coyoacán 04510 México D.F.Find this author on Google ScholarFind this author on PubMedSearch for this author on this siteFor correspondence: valdeanda@ciencias.unam.mx bcontreras@eead.csic.es souza@unam.mx

    A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/gix096 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

    These peer reviews were as follows:

    Reviewer 1: http://dx.doi.org/10.5524/REVIEW.100872 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.100873