Healthy microbiome - moving towards functional interpretation

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Microbiome-based disease prediction has significant potential as an early, non-invasive marker of multiple health conditions linked to dysbiosis of the human gut microbiota, thanks in part to decreasing sequencing and analysis costs. Microbiome health indices and other computational tools currently proposed in the field often are based on a microbiome’s species richness and are completely reliant on taxonomic classification. A resurgent interest in a metabolism-centric, ecological approach has led to an increased understanding of microbiome metabolic and phenotypic complexity revealing substantial restrictions of taxonomy-reliant approaches. In this study, we introduce a new metagenomic health index developed as an answer to recent developments in microbiome definitions, in an effort to distinguish between healthy and unhealthy microbiomes, here in focus, inflammatory bowel disease (IBD). The novelty of our approach is a shift from a traditional Linnean phylogenetic classification towards a more holistic consideration of the metabolic functional potential underlining ecological interactions between species. Based on well-explored data cohorts, we compare our method and its performance with the most comprehensive indices to date, the taxonomy-based Gut Microbiome Health Index ( GMHI ), and the high dimensional principal component analysis ( hiPCA) methods, as well as to the standard taxon-, and function-based Shannon entropy scoring. After demonstrating better performance on the initially targeted IBD cohorts, in comparison with other methods, we retrain our index on an additional 27 datasets obtained from different clinical conditions and validate our index’s ability to distinguish between healthy and disease states using a variety of complementary benchmarking approaches. Finally, we demonstrate its superiority over the GMHI and the hiPCA on a longitudinal COVID-19 cohort and highlight the distinct robustness of our method to sequencing depth. Overall, we emphasize the potential of this metagenomic approach and advocate a shift towards functional approaches in order to better understand and assess microbiome health as well as provide directions for future index enhancements. Our method, q2-predict-dysbiosis (Q2PD) , is freely available ( https://github.com/Kizielins/q2-predict-dysbiosis ).

Article activity feed

  1. AbstractMicrobiome-based disease prediction has significant potential as an early, non-invasive marker of multiple health conditions linked to dysbiosis of the human gut microbiota, thanks in part to decreasing sequencing and analysis costs. Microbiome health indices and other computational tools currently proposed in the field often are based on a microbiome’s species richness and are completely reliant on taxonomic classification. A resurgent interest in a metabolism-centric, ecological approach has led to an increased understanding of microbiome metabolic and phenotypic complexity revealing substantial restrictions of taxonomy-reliant approaches. In this study, we introduce a new metagenomic health index developed as an answer to recent developments in microbiome definitions, in an effort to distinguish between healthy and unhealthy microbiomes, here in focus, inflammatory bowel disease (IBD). The novelty of our approach is a shift from a traditional Linnean phylogenetic classification towards a more holistic consideration of the metabolic functional potential underlining ecological interactions between species. Based on well-explored data cohorts, we compare our method and its performance with the most comprehensive indices to date, the taxonomy-based Gut Microbiome Health Index (GMHI), and the high dimensional principal component analysis (hiPCA)methods, as well as to the standard taxon-, and function-based Shannon entropy scoring. After demonstrating better performance on the initially targeted IBD cohorts, in comparison with other methods, we retrain our index on an additional 27 datasets obtained from different clinical conditions and validate our index’s ability to distinguish between healthy and disease states using a variety of complementary benchmarking approaches. Finally, we demonstrate its superiority over the GMHI and the hiPCA on a longitudinal COVID-19 cohort and highlight the distinct robustness of our method to sequencing depth. Overall, we emphasize the potential of this metagenomic approach and advocate a shift towards functional approaches in order to better understand and assess microbiome health as well as provide directions for future index enhancements. Our method, q2-predict-dysbiosis (Q2PD), is freely available (https://github.com/Kizielins/q2-predict-dysbiosis).

    This work has been peer reviewed in GigaScience (https://doi.org/10.1093/gigascience/giaf015), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer 2: Saritha Kodikara

    In this study, the authors present a novel metagenomic health index designed to differentiate between healthy and unhealthy microbiomes. This area of research is crucial for developing a non-invasive, cost-effective method to assess patient health status. However, I have several suggestions that I believe will enhance the study and address some key points.

    Main Comments:

    1.) The study would benefit from additional post-analysis to provide greater depth. Although the authors applied their approach to several diseases, they did not elaborate on the significance of individual microbiome features across different diseases. For instance, the GMHI parameters were identified as least important in IBD—does this observation hold universally across all diseases analysed?

    2.) The index Q2D performed worse in AGP1 compared to HMP2 and AGP2. Is there a specific reason for this discrepancy? For example, does the index underperform in the heterogeneous functional landscape presented in AGP1 (Figure 2C)? An explanation for the reduced performance in this cohort would provide valuable insights into the method's performance under varying conditions.

    3.) It would be beneficial to make all processed data and relevant scripts available in a GitHub repository to ensure that the results presented in the paper can be replicated by other researchers.

    4.) When attempting to run the script available at https://github.com/Kizielins/q2-predict-dysbiosis, I encountered an error related to the scikit-learn version. The script appears to be compatible with version 1.2.2, whereas I was using version 1.4.2. Please consider updating the script or providing instructions for resolving version compatibility issues.

    5.) The rationale behind considering only positive correlations when calculating the index is unclear. It would be helpful to clarify why negative correlations were excluded from the index calculations.

    6.) In analysing longitudinal alterations, did the authors account for dependencies from previous time points Q2D index? If not, how do these longitudinal alterations differ from those observed in independent studies?

    7.) For each dataset analysed, additional details would be useful, such as the number of samples, species, functions, core functions, and the number of species remaining after applying the MDFS algorithm.

    8.) On Page 13, the authors state that they chose GMHI as their benchmark because hiPCA and Shannon entropy produced worse results for the HMP2 cohort. However, Supplementary Table 3 indicates that Shannon entropy had a lower p-value than GMHI in the Mann-Whitney U test.

    Minor comments:

    1. Page 11 Original: "Collecting information on feature importance at every iteration of the cross-validation procedure model, we consistently identified the two GMHI parameters as the least important (Figure 5b)." Suggested: "Collecting information on feature importance at every iteration of the cross-validation procedure model, we consistently identified the two GMHI parameters as the least important (Figure 4b??)."

    2. Page 12 Original: "Most importantly, Q2PD produced visually the highest scores for all healthy in comparison to unhealthy cohorts." Suggested: "Most importantly, Q2PD produced visually the highest median?? scores for all healthy in comparison to unhealthy cohorts."

    3. Page 12 Original: "Q2PD was also the only index to produce a statistically significant difference between Healthy and Obese in HMP2" Suggested: "Q2PD was also the only index to produce a statistically significant difference between Healthy and Obese in AGP2??"

    4. Page 14 Original: "The Q2PD important in all datasets that were included in its training and validation, specifically AGP_1, AGP_2 and HMP2 (Table 1, Supplementary Figure 7)." Suggested: "The Q2PD important in all datasets that were included in its training and validation, specifically AGP_1, AGP_2 and HMP2 (Table 1, Supplementary Figure 8??)."

  2. AbstractMicrobiome-based disease prediction has significant potential as an early, non-invasive marker of multiple health conditions linked to dysbiosis of the human gut microbiota, thanks in part to decreasing sequencing and analysis costs. Microbiome health indices and other computational tools currently proposed in the field often are based on a microbiome’s species richness and are completely reliant on taxonomic classification. A resurgent interest in a metabolism-centric, ecological approach has led to an increased understanding of microbiome metabolic and phenotypic complexity revealing substantial restrictions of taxonomy-reliant approaches. In this study, we introduce a new metagenomic health index developed as an answer to recent developments in microbiome definitions, in an effort to distinguish between healthy and unhealthy microbiomes, here in focus, inflammatory bowel disease (IBD). The novelty of our approach is a shift from a traditional Linnean phylogenetic classification towards a more holistic consideration of the metabolic functional potential underlining ecological interactions between species. Based on well-explored data cohorts, we compare our method and its performance with the most comprehensive indices to date, the taxonomy-based Gut Microbiome Health Index (GMHI), and the high dimensional principal component analysis (hiPCA)methods, as well as to the standard taxon-, and function-based Shannon entropy scoring. After demonstrating better performance on the initially targeted IBD cohorts, in comparison with other methods, we retrain our index on an additional 27 datasets obtained from different clinical conditions and validate our index’s ability to distinguish between healthy and disease states using a variety of complementary benchmarking approaches. Finally, we demonstrate its superiority over the GMHI and the hiPCA on a longitudinal COVID-19 cohort and highlight the distinct robustness of our method to sequencing depth. Overall, we emphasize the potential of this metagenomic approach and advocate a shift towards functional approaches in order to better understand and assess microbiome health as well as provide directions for future index enhancements. Our method, q2-predict-dysbiosis (Q2PD), is freely available (https://github.com/Kizielins/q2-predict-dysbiosis).

    This work has been peer reviewed in GigaScience (https://doi.org/10.1093/gigascience/giaf015), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer 1: Vanessa Marcelino

    The manuscript proposes a new method to distinguish between healthy and diseased human gut microbiomes. The topic is timely, as to date, there is no consensus on what constitutes a healthy microbiome. The key conceptual advance of this study is the integration of functional microbiome features to define health. Their new computational approach, q2-predict-dysbiosis (Q2PD), is open source and available on GitHub.

    While the manuscript is conceptually innovative and interesting for the scientific community, there are several major limitations in the current version of this study.

    1. To develop the Q2PD, they define features associated with health by comparing it with microbiome samples from IBD patients. There are many more non-healthy/dysbiotic phenotypes beyond IBD, therefore it is not accurate to use IBD as synonymous of dysbiosis as done throughout this version of the paper.

    2. The study initially tests the performance of Q2PD against other gut microbiome health indexes (GMHI and hiPCA) using the same data that was used to select the health-associated features of Q2PD. Model performance should be assessed on independent data. On a separate analysis, they do use different datasets (from GMHI and hiPCA), but these datasets seem to be incomplete - GMHI and hiPCA publications have included 10 or more disease categories, and it is unclear why only 4 categories are shown in this study.

    3. While Q2PD does provide visible improvements in differentiating some diseases from healthy phenotypes, the accuracy and sensitivity of Q2PD isn't clear. To adopt Q2PD, I would like to know what are the chances that the classification results will be correct.

    4. There is very little documentation on how to use Q2PD. What are the expect outputs for example, do we need to chose a threshold to define health? Is the method completely dependent on Humann and Metaphlan outputs, or other formats are accepted? The test data contain some samples with zero counts. I got an error when trying it with the test data (ValueError: node array from the pickle has an incompatible dtype…).

    Therefore, I recommend including a range of disease categories to develop Q2PD and use independent datasets to validate the model in terms of accuracy and sensitivity. Alternatively, consider focusing this contribution on IBD. Making the code more user friendly will drastically increase the adoption of Q2PD by the community.

    Please also use page and line numbers when submitting the next version. Other suggestions:

    Abstract: I recommend replacing 'attributed' with 'linked', as 'attributed' suggests that dysbiosis may be causing (rather than reflecting) disease.

    Results: Please indicate what it is meant by 'function' here - it will be good to clarify that this method uses Metaphlan's read-based approach to identify metabolic pathways. What is used, pathway completeness or abundance?

    Results regarding Figure 3a are difficult to interpret. Is 'non-negatively correlated' the same as 'positively correlated'? What does the colour gradient represent - their abundance in those groups, or the strength of their correlation?

    "We observed that the prevalence of the pairs positively correlated in health was higher than in a number of disease-associated groups (Figure 3b)" . This is a very generalised statement considering that only half of the comparisons were significant. How co-occurring species were selected?

    "To test this, we compared the contributions of MDFS-identified species to "core functions" in different groups (Supplementary Figure 4)." How was this comparison made, based on species correlations? The caption of these figures could include more detail - it just says 'Top species contributions to functions.' but how do you define 'top' ? What do the colours represent?

    'This finding was congruent with our earlier suspicions of functional plasticity; modulation of function and thus altered connectivity in the interaction network, shifting towards less abundant, non-core functions upon perturbation of homeostasis.' This is reasonable, but I don't understand how you can draw this conclusion from these figures where there seems to be no significant difference between health and disease.

    Section 'Testing q2-predict-dysbiosis, GMHI and hiPCA accuracy of prediction for healthy and IBD individuals'

    What is the difference between fraction of "core functions" found the fraction of "core functions" among all functions?

    "Most importantly, Q2PD produced visually the highest scores for all healthy in comparison to unhealthy cohorts" . This was not statistically significant. In fact, GMHI finds more significant differences between health and disease than Q2PD.

    Sup. Figure 7 - would be informative to add the name/description of these metabolites not just their ID).

    'Although the threshold of 0.6 as determinant of health by the Q2PD was not applicable to the new datasets'. Does the threshold to define health with Q2PD change depending on the dataset? What are the implications of this for the applicability of this index?

    Effects of sequencing depth - this is a very good addition to the paper, the effects of sequencing depth can be profound but are ignored in most studies, so I commend the authors for doing this here. It would be even better, in my opinion, if this was done with the same datasets used to test/compare Q2PD with other methods, as using a different dataset here adds a new layer of confounding factors.

    'the GMHI and the hiPCA produced the opposite trend, wrongly indicating patient recovery.' The difference here is striking, what is driving this trend?

    The Gut Microbiome Wellness Index 2 (GMWI2) is now published. I don't think it needs to be part of the benchmarking, but it could be acknowledged/cited here.

    Methods: More information on how the data was processed is needed - how were the abundance tables normalized? Which output from Humann was used for downstream analyses?

    To ensure reproducibility, please provide the scripts/code used for analyses and figures.