Recurrent neural networks enable design of multifunctional synthetic human gut microbiome dynamics
Curation statements for this article:
Curated by eLife
Evaluation Summary:
The ultimate goal of this work is to apply machine learning to learn from experimental data on temporal dynamics and functions of microbial communities to predict their future behavior and design new communities with desired functions. Using a significant amount of experimental data, the authors suggest that their method outperforms a commonly used approach. Overall, the work is potentially of broad interest to those working on microbiome prediction and design.
(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)
This article has been Reviewed by the following groups
Listed in
 Evaluated articles (eLife)
 Computational and Systems Biology (eLife)
 @drpeterrodgers's saved articles (drpeterrodgers)
Abstract
Predicting the dynamics and functions of microbiomes constructed from the bottomup is a key challenge in exploiting them to our benefit. Current models based on ecological theory fail to capture complex community behaviors due to higher order interactions, do not scale well with increasing complexity and in considering multiple functions. We develop and apply a long shortterm memory (LSTM) framework to advance our understanding of community assembly and healthrelevant metabolite production using a synthetic human gut community. A mainstay of recurrent neural networks, the LSTM learns a high dimensional datadriven nonlinear dynamical system model. We show that the LSTM model can outperform the widely used generalized LotkaVolterra model based on ecological theory. We build methods to decipher microbemicrobe and microbemetabolite interactions from an otherwise blackbox model. These methods highlight that Actinobacteria, Firmicutes and Proteobacteria are significant drivers of metabolite production whereas Bacteroides shape community dynamics. We use the LSTM model to navigate a large multidimensional functional landscape to design communities with unique healthrelevant metabolite profiles and temporal behaviors. In sum, the accuracy of the LSTM model can be exploited for experimental planning and to guide the design of synthetic microbiomes with target dynamic functions.
Article activity feed


Author Response:
Reviewer #1 Public Review:
This is an interesting study demonstrating the application of deep learning to model microbiome dynamics of the human gut community, improving on existing approaches (for example regarding scalability). Furthermore, the model is able to better predict microbemicrobe and microbemetabolite interactions as compared to classical approaches like ODEs or regression. The authors show that their LSTMbased model is able to successfully predict the abundance not only at the final time step but also at intermediate time steps. In general, the authors did a good job in demonstrating the strengths of their proposed approach. The major findings were carefully interpreted and challenged through multiple tests (explainability and sensitivity analysis). As the microbiome is not my primary area of …
Author Response:
Reviewer #1 Public Review:
This is an interesting study demonstrating the application of deep learning to model microbiome dynamics of the human gut community, improving on existing approaches (for example regarding scalability). Furthermore, the model is able to better predict microbemicrobe and microbemetabolite interactions as compared to classical approaches like ODEs or regression. The authors show that their LSTMbased model is able to successfully predict the abundance not only at the final time step but also at intermediate time steps. In general, the authors did a good job in demonstrating the strengths of their proposed approach. The major findings were carefully interpreted and challenged through multiple tests (explainability and sensitivity analysis). As the microbiome is not my primary area of expertise, I cannot comment on the validity of the biological interpretations.
The methods section (machine learning part) is rather short and in my opinion does not provide sufficient details. Since generating deep learningbased models can be rather challenging, it would be valuable to explain how the model was obtained and how the parameter tuning was done. Furthermore, the choice of the LIME and CAM as explanation methods seems arbitrary. It is unclear why these methods are preferable to other methods.
Thank you for noting the significance of the proposed work. We aim to emphasize the power of nonparametric models that can capture the inputoutput relations much better than simple, albeit explainable, models based on ecological theory. However, by incorporating explainability methods into the LSTM model, we are able to provide various biological insights that are otherwise only possible with models with parameters that can be directly interpreted by their effects on biological system behaviors. There indeed are other methods for explanation of deep networks, notably the Shapley explainability method [2] which is substantially more computationally burdensome than methods such as CAM or LIME [3, 4]. LIME and CAM are based on firstorder perturbations around the already learned model, and can be used to depict local model behavior with little to no computational burden. On the contrary, explainability methods like Shapley are computationally expensive. An exact computation of Shapley values for a Kdimensional input requires estimating 2 K possible coalitions of the feature values and the “absence” of a feature has to be simulated by drawing random instances. This increases the variance for the estimate of the Shapley values estimation. Thus, we incorporated LIME and CAM for there ease of implementation and simplicity. In particular, the CAMlike approach requires a single backpropagation pass (and the information is already available during the training process). In the revision we have included further discussion of our motivation for selecting the LIME and CAMlike approaches in the Section titled “Understanding Relationships Between Variables Using LIME”.
Reviewer #2 Public Review:
Overall, this is a very strong paper that represents an important contribution to the field of predicting microbiome dynamics and function using ML. In terms of methodology, I appreciate how the team integrates quantitative measurements, dynamical modeling, and machine learning.
Thank you for your encouraging remarks and accurately summarizing our work. Indeed, the team has benefited immensely through this collaboration that involve different facets of the proposed work  microbiome experiments, computational biology and artificial intelligence.
Reviewer #3 Public Review:
Summary: The authors ultimately wish to construct microbiomes with desired functions. To that end they have combined an LSTM model and FF neural network for microbiome and metabolite abundance data that can predict both microbial dynamics and their functional capacity (metabolic potential) over time. Their model is compared to a gLV composite model. Model performance is compared on synthetic data and real data. Sensitivity analysis was performed on the models to determine which predictions were most sensitive to the amount of training data and what taxa or taxa pairs were most important for model prediction. The authors also incorporated extra experiments after learning on the original data to then test how well their model could predict functional capacity on new test data. The main findings were that Bacteroides has broad metabolic capability with the model highlighting specific species with more specialized metabolic capabilities.
Thank you. This summary is accurate. However, we would like to highlight a few other additional biological findings from our paper: (1) pairwise interactions influence succinate and acetate, whereas single species are the major drivers of butyrate and lactate (Figure 4c,d); (2) communities can display similar endpoint metabolite profiles but disparate dynamic behaviors (Clusters 2 and 3 in Figure 5c,d) which may have important health implications (e.g. healthrelevant metabolites which display nonmonotonic trends in their dynamics and trigger dysbiosis by reaching a transient maximum concentration that has negative health consequences); and individual species can transiently impact metabolite dynamics (e.g. PC and BA in Figure 5j).
Points of weakness: 
It is unclear why an LSTM would be a good model for the microbiome
We thank the reviewer for this question. LSTMs are a good model for the microbiome because (1) LSTM is a natural choice for modeling timeseries data; (2) LSTMs are highly flexible models that can capture complex interaction networks (i.e. higherorder interactions) and feedback loops in a way that other ecological models cannot because they are universal function approximators; (3) LSTMs can be modified to capture additional system variables such as environmental inputs (e.g. metabolites, pH, oxygen). In addition, LSTMs may have some advantages over traditional RNNs because they can capture longterm dependencies via additional parameters that adjust how much earlier time points impact predictions at later time points in a timeseries. We have updated our introduction to provide this motivation for using LSTMs to model microbiomes.
It is unclear what aspects of the dynamics are longterm, and whether the experiments capture this longterm effect
The LSTM has advantages over other microbiome models such as gLV since it captures long term dynamics. LSTM is shown to be both flexible and better (than the most commonly used gLV model) at predicting the transient, as well as the long term dynamics. For instance, Figure 2figure supplement 1b represents one such community comprising 11 species, where the steadystate (long term) dynamics are accurately captured by our LSTM models. Recall that the experimental data consists of timeseries measurements sampled up to t = 60 hrs, which is a reasonable time frame to evaluate longterm dynamics. In addition, the communities were passaged (aliquots of the communities were transferred into fresh media periodically every 24 hr), which allows characterization of the communities over a longer timescale. The model can be rolled forward in time to estimate even longertime behavior, however, we currently don’t have data to evaluate the model’s predictions beyond approximately 60 hr.
Discussions around the LSTM model and some ML and dynamical systems concepts are inaccurate (LSTM with one hidden unit is not really a “deep” model, gLV models are linear in the parameters and thus the parameters are trivial to solve for give the microbial abundances)
We respectfully disagree with the claim of the reviewer that our implementation of the LSTM is not a deep model. Please refer to our response to your comment 8 for detailed explanation.
Not enough detail is given regarding the LSTM model or the composite model to understand them
Thank you for your suggestion. The Methods Section has been revised substantially to address the lack of details about the LSTMs and the Composite Model): Please refer to our detailed response to your comment 4.
part of the composite model is in MATLAB and could not be tested
While MATLAB is not free, it is a very widely used software package with unique capabilities. For readers who do not have access to MATLAB, OCTAVE is an open access clone that can be used to verify our results. Please refer to our detailed response to your comment 6.
authors claim that their model is interpretable, but it is no more interpretable than any differentiable model that can use gradients to open the lid after training
We thank the reviewer for this comment. While we did not claim our model to be interpretable, rather that we used methods to interpret the trained models, we agree that the methods that we used to extract biological information from LSTMs could be used with a wide array of model types. To clarify our use of interpretable methods, we have created a new subsection of the Results entitled ”Using local interpretable modelagnostic explanations to decipher interactions” where we have expanded our discussion. In addition to the specific interpretations that we have obtained from our local interpretability (LIME) analysis, we have included the following sentences at the beginning of the subsection: “One of the commonly noted limitations of machine learning models is their lack of interpretability for extracting biological information about the system. Fortunately, generally applicable tools have been developed to aid in model interpretation. Thus, we sought to use such methods to decipher key relationships among variables within our LSTM to deepen our biological understanding of the system.”
The authors are commended on their extensive experimental integration and some aspects of validation. The models however are missing enough details in the text to understand how they were used. Also, the comparison seems a little unfair. From reading the text it appears that the LSTM+FF model was trained jointly, whereas, the composite model first learns from the microbiome data and then the metabolite prediction component is trained after the gLV model parameters are held fixed. Any model trained jointly will have an advantage to one trained in this twostep process. If the main claim of the paper is that the LSTM model is better than a gLV model then the comparison should be more systematic and fair.
We appreciate the acknowledgement of our efforts to integrate experiments and modeling. As we have commented elsewhere in this review, we have done the following to clarify the details of our modeling:
We have reorganized our methods section to make it easier to find relevant details. We have created three sections: “Experimental Methods”, “Computational Methods”, and “Specific Applications of Computational Methods”. This final section has subsections describing all analyses presented in the paper with references to which Figure the methods section is discussing.
We have added details about the ground truth models and train/test methods used for our in silico comparison of the gLV and LSTM in predicting species abundance in the section labeled “Comparison of gLV and LSTM in silico (Figure 1)”
We have clarified the methods section describing the composite model used for comparison with the LSTM for predicting species abundance and metabolite production. Methods Section “Composite Model: Regression Models for Predicting Metabolite Concentrations (Figure 3)”.
In regards to a fairer comparison between gLV composite model and LSTM, one of the weaknesses of the composite model is that there is no feedback between the species variables and the metabolite variables. The metabolite variables are a function of the endpoint species abundance, but the species abundances are not a function of the metabolite concentrations. Thus, even if we were to devise an endtoend training scheme, we wouldn’t expect the results to change. We have now updated our manuscript to mention this key advantage of the LSTM model. However, to make one “fairer” comparison, we tried replacing the regression model in the composite with a FeedForward Network or a Random Forest Regressor as described earlier in our response:
We have updated the comparisons in Figure 3figure supplement 3a to include the prediction accuracy for gLV+FF and gLV+Random Forest Regressor. While some improvement in the prediction of succinate, lactate, and acetate were observed relative to the original composite model, none of the new models outperformed the LSTM in all four metabolites. We have added a sentence discussing this result to the main text: “Additionally, replacing the regression portion of the composite model with either a Random Forest Regressor or a Feed Forward Network did not improve the metabolite prediction accuracy beyond that of the LSTM (Figure 3figure supplement 3a).”

Evaluation Summary:
The ultimate goal of this work is to apply machine learning to learn from experimental data on temporal dynamics and functions of microbial communities to predict their future behavior and design new communities with desired functions. Using a significant amount of experimental data, the authors suggest that their method outperforms a commonly used approach. Overall, the work is potentially of broad interest to those working on microbiome prediction and design.
(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

Reviewer #1 (Public Review):
This is an interesting study demonstrating the application of deep learning to model microbiome dynamics of the human gut community, improving on existing approaches (for example regarding scalability). Furthermore, the model is able to better predict microbemicrobe and microbemetabolite interactions as compared to classical approaches like ODEs or regression. The authors show that their LSTMbased model is able to successfully predict the abundance not only at the final time step but also at intermediate time steps. In general, the authors did a good job in demonstrating the strengths of their proposed approach. The major findings were carefully interpreted and challenged through multiple tests (explainability and sensitivity analysis). As the microbiome is not my primary area of expertise, I cannot …
Reviewer #1 (Public Review):
This is an interesting study demonstrating the application of deep learning to model microbiome dynamics of the human gut community, improving on existing approaches (for example regarding scalability). Furthermore, the model is able to better predict microbemicrobe and microbemetabolite interactions as compared to classical approaches like ODEs or regression. The authors show that their LSTMbased model is able to successfully predict the abundance not only at the final time step but also at intermediate time steps. In general, the authors did a good job in demonstrating the strengths of their proposed approach. The major findings were carefully interpreted and challenged through multiple tests (explainability and sensitivity analysis). As the microbiome is not my primary area of expertise, I cannot comment on the validity of the biological interpretations.
The methods section (machine learning part) is rather short and in my opinion does not provide sufficient details. Since generating deep learningbased models can be rather challenging, it would be valuable to explain how the model was obtained and how the parameter tuning was done. Furthermore, the choice of the LIME and CAM as explanation methods seems arbitrary. It is unclear why these methods are preferable to other methods.

Reviewer #2 (Public Review):
In this study, Baranwal et al demonstrate the use of a longshort memory neural network (LSTM) to predict temporal dynamics of microbial communities and metabolic functions. The key points of the study include:
1. The LSTM model outperforms the standard general Lotka Volterra model in predicting both experimental data and simulated data (from another gLV model).
2. Once trained, the LSTM model allows accelerated prediction of microbial community dynamics.
3. The predictions from LSTM can enable the generation of biological insights by proper analysis.
Overall, this is a very strong paper that represents an important contribution to the field of predicting microbiome dynamics and function using ML. In terms of methodology, I appreciate how the team integrates quantitative measurements, dynamical modeling, and …
Reviewer #2 (Public Review):
In this study, Baranwal et al demonstrate the use of a longshort memory neural network (LSTM) to predict temporal dynamics of microbial communities and metabolic functions. The key points of the study include:
1. The LSTM model outperforms the standard general Lotka Volterra model in predicting both experimental data and simulated data (from another gLV model).
2. Once trained, the LSTM model allows accelerated prediction of microbial community dynamics.
3. The predictions from LSTM can enable the generation of biological insights by proper analysis.
Overall, this is a very strong paper that represents an important contribution to the field of predicting microbiome dynamics and function using ML. In terms of methodology, I appreciate how the team integrates quantitative measurements, dynamical modeling, and machine learning.

Reviewer #3 (Public Review):
The authors ultimately wish to construct microbiomes with desired functions. To that end they have combined an LSTM model and FF neural network for microbiome and metabolite abundance data that can predict both microbial dynamics and their functional capacity (metabolic potential) over time. Their model is compared to a gLV composite model. Model performance is compared on synthetic data and real data. Sensitivity analysis was performed on the models to determine which predictions were most sensitive to the amount of training data and what taxa or taxa pairs were most important for model prediction. The authors also incorporated extra experiments after learning on the original data to then test how well their model could predict functional capacity on new test data. The main findings were that Bacteroides …
Reviewer #3 (Public Review):
The authors ultimately wish to construct microbiomes with desired functions. To that end they have combined an LSTM model and FF neural network for microbiome and metabolite abundance data that can predict both microbial dynamics and their functional capacity (metabolic potential) over time. Their model is compared to a gLV composite model. Model performance is compared on synthetic data and real data. Sensitivity analysis was performed on the models to determine which predictions were most sensitive to the amount of training data and what taxa or taxa pairs were most important for model prediction. The authors also incorporated extra experiments after learning on the original data to then test how well their model could predict functional capacity on new test data. The main findings were that Bacteroides has broad metabolic capability with the model highlighting specific species with more specialized metabolic capabilities.
Strengths:
 Paper integrates extensive experimental data.
 Sensitivity analysis was performed on the model (which is often neglected), the reviewer appreciates this extra step.
 LSTM model was in python and notebooks could be downloaded and run with ease.Points of weakness:
 It is unclear why an LSTM would be a good model for the microbiome
 It is unclear what aspects of the dynamics are longterm, and whether the experiments capture this longterm effect
 Discussions around the LSTM model and some ML and dynamical systems concepts are inaccurate (LSTM with one hidden unit is not really a "deep" model, gLV models are linear in the parameters and thus the parameters are trivial to solve for give the microbial abundances)
 Not enough detail is given regarding the LSTM model or the composite model to understand them
 part of the composite model is in Matlab and could not be tested
 authors claim that their model is interpretable, but it is no more interpretable than any differentiable model that can use gradients to open the lid after trainingAn appraisal of whether the authors achieved their aims, and whether the results support their conclusions: The authors are commended on their extensive experimental integration and some aspects of validation. The models however are missing enough details in the text to understand how they were used. Also, the comparison seems a little unfair. From reading the text it appears that the LSTM+FF model was trained jointly, whereas, the composite model first learns from the microbiome data and then the metabolite prediction component is trained after the gLV model parameters are held fixed. Any model trained jointly will have an advantage to one trained in this twostep process. If the main claim of the paper is that the LSTM model is better than a gLV model then the comparison should be more systematic and fair.
Likely impact on the field: The tight coupling of new experiments with computational methods is important. All too often a tool is made but only shown to work on data not tailored to the tool. Here, both are designed together.
