SIMMER employs similarity algorithms to accurately identify human gut microbiome species and enzymes capable of known chemical transformations

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment:

    The authors aim to predict bacterial enzymes responsible for drug biotransformation, and the work showcases the potential of this approach as a hypothesis generator for characterizing and validating novel bacterial enzymes in vitro. The authors describe the relevance of an accurate input (in terms of reaction completeness, including cofactors and reaction products) as paramount for the quality of the prediction. The conclusions, however, require additional experimental and non-experimental validations.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Bacteria within the gut microbiota possess the ability to metabolize a wide array of human drugs, foods, and toxins, but the responsible enzymes for these chemical events remain largely uncharacterized due to the time-consuming nature of current experimental approaches. Attempts have been made in the past to computationally predict which bacterial species and enzymes are responsible for chemical transformations in the gut environment, but with low accuracy due to minimal chemical representation and sequence similarity search schemes. Here, we present an in silico approach that employs chemical and protein S imilarity algorithms that I dentify M icrobio M e E nzymatic R eactions (SIMMER). We show that SIMMER accurately predicts the responsible species and enzymes for a queried reaction, unlike previous methods. We demonstrate SIMMER use cases in the context of drug metabolism by predicting previously uncharacterized enzymes for 88 drug transformations known to occur in the human gut. We validate these predictions on external datasets and provide an in vitro validation of SIMMER’s predictions for metabolism of methotrexate, an anti-arthritic drug. After demonstrating its utility and accuracy, we made SIMMER available as both a command-line and web tool, with flexible input and output options for determining chemical transformations within the human gut. We present SIMMER as a computational addition to the microbiome researcher’s toolbox, enabling them to make informed hypotheses before embarking on the lengthy laboratory experiments required to characterize novel bacterial enzymes that can alter human ingested compounds.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    Bustion and colleagues outline the creation and testing of an in-silicon method to query gut microbiome databases for genes encoding enzymes predicted to catalyze a reaction of interest, which is provided by the user. Strengths of the tool include attempts to examine nearly 9,000 MetaCyc reactions in a pre-calculated fashion and to rank order enzymes based on their likelihood of catalyzing a reaction. Substrates, products, and even cofactors, if known, are employed to strengthen the power of the search algorithm, which also employs a hidden Markov model to improve the selection of putative hit enzymes. The authors outline high success rates with examples presented and compare those results with other extant methods, which are reported to perform in a less robust manner. Weaknesses include lack of evidence of success on a more difficult "real world" example. However, the tool outlined is a clear advance over existing methods and will be useful to explore the diversity of chemical transformation performed by commensal microbiota.

    We thank Reviewer 1 for their positive feedback and constructive summary. We agree that a real-world example would add confidence to our findings. We previously demonstrated SIMMER’s utility using published datasets. To expand upon these findings, we added another evaluation on an external dataset (Artacho et al., 2020) and performed new experiments to test SIMMER predictions for methotrexate metabolism into DAMPA and glutamate, a reaction known to be performed by the human microbiome but for which human gut strains and specific gut enzymes were not previously known. Both the new external dataset and our experimental findings validate SIMMER’s predictions of bacteria capable of metabolizing methotrexate, the mainline therapeutic for rheumatoid arthritis patients.

    Reviewer #2 (Public Review):

    This work provides a new computational tool for the systematic characterization of biotransformation reactions in the human gut microbiome: given a biotransformation reaction of interest, it predicts a list of candidate bacterial species, enzymes, and EC identifiers putatively capable of performing the queried reaction. The method is innovative and clearly presented.

    The pipeline that relies on both chemical and protein similarity algorithms, is in principle applicable to any biotransformation reaction that can be formulated as linked substrates and products (possibly including co-factors). This contrasts with other approaches that, for example, only rely on smaller databases and solely rely on substrates and chemical similarity. Moreover, SIMMER outperformed two other recently developed methods, against which it was benchmarked for its prediction accuracy when tested on a control test set derived from literature.

    The work interestingly focuses on predicting bacterial enzymes responsible for drug biotransformation, therefore showcasing its potential as a hypothesis generator for characterizing and validating novel bacterial enzymes in vitro.

    The authors correctly describe the relevance of an accurate input (in terms of reaction completeness, including cofactors and reaction products) as paramount for the quality of the prediction.

    The conclusions of this paper are mostly well supported by data, but some aspects of performance evaluation and its generality might benefit from additional elaborations and clarifications.

    1. Great emphasis has been dedicated to the prediction performance of SIMMER over a positive control set derived from the available literature. However, a more extensive description and analysis of false positive results are needed to better understand the possible impact of the (potentially many) false positive predictions listed for each reaction.

    We agree that our analysis would benefit from an assessment of false positives. Unfortunately, current literature usually reports which reactions an enzyme is capable, rather than incapable, of performing. For this reason, we took a conservative approach and decided to define all reactions preceding that which yielded a positive control enzyme sequence as false positives. This is now described above in Essential Revisions Response 1.3.

    1. The authors imply that the current method is superior to two other methods based on accuracy. However, a more extensive description of the benchmarking results would strengthen these benchmarking efforts.

    We have addressed this concern in Essential Revisions Response 3.

    1. The authors only showcase SIMMER in the context of drug metabolism but claim its applicability to be general enough to also describe other biotransformation in the human gut microbiota. Although in principle believable, the authors could improve the credibility and generalizability of their method by demonstrating another use case, e.g., food compounds, for which extensive metagenomic and metabolomic data are already available from previous gut microbiome studies.

    We agree that assessments of SIMMER’s predictions on food metabolism would improve the generalizability of the method. We have edited the text to focus on drug metabolism, as we believe SIMMER’s application to food metabolism merits a more thorough, future investigation.

    1. Showcasing experimental in vitro validation of SIMMER predicted enzyme(s) could greatly strengthen the relevance of this work.

    We have addressed this in Essential Revisions Response 2.

    1. Throughout the text and the title, a more careful and precise phrasing of the tool's scope (characterization of microbiome-encoded enzymatic reactions and not the identification of novel chemical transformations) would improve the reader's understanding of the work.

    We agree, and have reworded many key phrases in the text, including the title.

    Reviewer #3 (Public Review):

    This manuscript presents a new tool, SIMMER, to predict bacterial enzymemediated transformations of compounds, an important and incompletely understood aspect of microbiome drug metabolism. The authors compare their resource to existing resources that allow users to generate hypotheses related to compound toxicity and putative routes of compound metabolism. The authors identify the key innovations of their resource as including full chemical representations of reactions and a novel method to predict an enzyme's EC number (a description of function) from its reaction.

    Strengths

    Generating user-friendly tools to explore existing knowledge of bacterial enzymes and their reactions is important.

    SIMMER is a novel resource where the user provides the substrates and products as input and receives a list of potential microbiome enzymes as output.

    SIMMER includes a novel EC predictor based on reaction rather than based on sequence.

    Weaknesses

    Validation claims are not well supported by the results.

    We have extensively edited the manuscript to better describe our previous computational validations, and we have added new analyses to further evaluate SIMMER. We added an additional validation on an external dataset, an in vitro experimental assessment of SIMMER’s predictions for methotrexate metabolism, two new reactions to the positive control analysis, a false positive rate, and additional comparisons to the two competing methods.

    Need for the user to know both the substrate and the product for a reaction of interest limits the utility of the resource.

    We agree that this is a limitation for the user, but as we show in our Results, relying on substrates alone does not yield appropriate representations of reactions and therefore does not allow for accurate predictions of responsible species/strains and enzymes (i.e., finding True Positives, and confirming associations from previously collected data). We agree that tools requiring only substrates are convenient, but our results show that they are less helpful in finding appropriate metabolism and enzyme predictions. Many studies of biotransformation in the human gut identify the product information or product structure via HPLC, LC-MS, and NMR techniques. In cases where such data was not gathered, or not gathered with enough structural resolution, researchers can use tools such as Biotransformer to make product template predictions before inputting a query to SIMMER. This recommendation is included in the present manuscript’s lines 376–391:

    In instances when DrugBug and MicrobeFDT did make predictions, they suffered from low accuracy (Table 1), which we hypothesized was due to both methods’ reliance on substrate rather than reaction chemistry. Biotransformations involve the relationship between substrate(s), cofactor(s), and an enzyme to yield a particular product(s). As one substrate can exhibit affinity for multiple enzymes, resulting in multiple unique products, sole employment of substrates in a chemical fingerprint does not achieve the resolution necessary to make relevant predictions. To test if SIMMER’s better performance could be attributed to including cofactors and products, we modified our code to run with a chemical representation that includes only the substrate of each positive control reaction. Enzyme prediction accuracy dropped from 88% down to 33%, and EC prediction accuracy dropped from 93% down to 48% (Table 1—source data), supporting the hypothesis that SIMMER’s better performance when compared to DrugBug and MicrobeFDT is due in large part to our using chemical representations that include the full reaction. These results are in line with our previous demonstration that SIMMER clusters enzymatic reaction chemistry only when a full reaction is employed (Figure 2, Figure 2—figure supplement 4).

    Reliance on homology transfer annotation to predict enzyme function; this approach has important, microbiome-relevant, limitations.

    Please refer to our separate Common_Questions.pdf document, Common question 1: Are EC codes sufficient to select enzyme orthologs within an overall class?

  2. eLife Assessment:

    The authors aim to predict bacterial enzymes responsible for drug biotransformation, and the work showcases the potential of this approach as a hypothesis generator for characterizing and validating novel bacterial enzymes in vitro. The authors describe the relevance of an accurate input (in terms of reaction completeness, including cofactors and reaction products) as paramount for the quality of the prediction. The conclusions, however, require additional experimental and non-experimental validations.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    Bustion and colleagues outline the creation and testing of an in-silicon method to query gut microbiome databases for genes encoding enzymes predicted to catalyze a reaction of interest, which is provided by the user. Strengths of the tool include attempts to examine nearly 9,000 MetaCyc reactions in a pre-calculated fashion and to rank order enzymes based on their likelihood of catalyzing a reaction. Substrates, products, and even cofactors, if known, are employed to strengthen the power of the search algorithm, which also employs a hidden Markov model to improve the selection of putative hit enzymes. The authors outline high success rates with examples presented and compare those results with other extant methods, which are reported to perform in a less robust manner. Weaknesses include lack of evidence of success on a more difficult "real world" example. However, the tool outlined is a clear advance over existing methods and will be useful to explore the diversity of chemical transformation performed by commensal microbiota.

  4. Reviewer #2 (Public Review):

    This work provides a new computational tool for the systematic characterization of biotransformation reactions in the human gut microbiome: given a biotransformation reaction of interest, it predicts a list of candidate bacterial species, enzymes, and EC identifiers putatively capable of performing the queried reaction. The method is innovative and clearly presented.

    The pipeline that relies on both chemical and protein similarity algorithms, is in principle applicable to any biotransformation reaction that can be formulated as linked substrates and products (possibly including co-factors). This contrasts with other approaches that, for example, only rely on smaller databases and solely rely on substrates and chemical similarity. Moreover, SIMMER outperformed two other recently developed methods, against which it was benchmarked for its prediction accuracy when tested on a control test set derived from literature.

    The work interestingly focuses on predicting bacterial enzymes responsible for drug biotransformation, therefore showcasing its potential as a hypothesis generator for characterizing and validating novel bacterial enzymes in vitro.

    The authors correctly describe the relevance of an accurate input (in terms of reaction completeness, including cofactors and reaction products) as paramount for the quality of the prediction.

    The conclusions of this paper are mostly well supported by data, but some aspects of performance evaluation and its generality might benefit from additional elaborations and clarifications.

    1. Great emphasis has been dedicated to the prediction performance of SIMMER over a positive control set derived from the available literature. However, a more extensive description and analysis of false positive results are needed to better understand the possible impact of the (potentially many) false positive predictions listed for each reaction.

    2. The authors imply that the current method is superior to two other methods based on accuracy. However, a more extensive description of the benchmarking results would strengthen these benchmarking efforts.

    3. The authors only showcase SIMMER in the context of drug metabolism but claim its applicability to be general enough to also describe other biotransformation in the human gut microbiota. Although in principle believable, the authors could improve the credibility and generalizability of their method by demonstrating another use case, e.g., food compounds, for which extensive metagenomic and metabolomic data are already available from previous gut microbiome studies.

    4. Showcasing experimental in vitro validation of SIMMER predicted enzyme(s) could greatly strengthen the relevance of this work.

    5. Throughout the text and the title, a more careful and precise phrasing of the tool's scope (characterization of microbiome-encoded enzymatic reactions and not the identification of novel chemical transformations) would improve the reader's understanding of the work.

  5. Reviewer #3 (Public Review):

    This manuscript presents a new tool, SIMMER, to predict bacterial enzyme-mediated transformations of compounds, an important and incompletely understood aspect of microbiome drug metabolism. The authors compare their resource to existing resources that allow users to generate hypotheses related to compound toxicity and putative routes of compound metabolism. The authors identify the key innovations of their resource as including full chemical representations of reactions and a novel method to predict an enzyme's EC number (a description of function) from its reaction.

    Strengths:

    • Generating user-friendly tools to explore existing knowledge of bacterial enzymes and their reactions is important.

    • SIMMER is a novel resource where the user provides the substrates and products as input and receives a list of potential microbiome enzymes as output.

    • SIMMER includes a novel EC predictor based on reaction rather than based on sequence.

    Weaknesses:

    • Validation claims are not well supported by the results.

    • Need for the user to know both the substrate and the product for a reaction of interest limits the utility of the resource.

    • Reliance on homology transfer annotation to predict enzyme function; this approach has important, microbiome-relevant, limitations.