CAT Bridge: An Efficient Toolkit for Gene-Metabolite Association Mining from Multi-Omics Data

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

With advancements in sequencing and mass spectrometry technologies, multi-omics data can now be easily acquired for understanding complex biological systems. Nevertheless, substantial challenges remain in determining the association between gene-metabolite pairs due to the non-linear and multifactorial interactions within cellular networks. The complexity arises from the interplay of multiple genes and metabolites, often involving feedback loops and time-dependent regulatory mechanisms that are not easily captured by traditional analysis methods.

Findings

Here, we introduce Compounds And Transcripts Bridge (abbreviated as CAT Bridge, available at https://catbridge.work ), a free user-friendly platform for longitudinal multi-omics analysis to efficiently identify transcripts associated with metabolites using time-series omics data. To evaluate the association of gene-metabolite pairs, CAT Bridge is a pioneering work benchmarking a set of statistical methods spanning causality estimation and correlation coefficient calculation for multi-omics analysis. Additionally, CAT Bridge features an artificial intelligence (AI) agent to assist users interpreting the association results.

Conclusions

We applied CAT Bridge to experimentally obtained Capsicum chinense (chili pepper) and public human and Escherichia coli ( E. coli ) time-series transcriptome and metabolome datasets. CAT Bridge successfully identified genes involved in the biosynthesis of capsaicin in C. chinense . Furthermore, case study results showed that the convergent cross mapping (CCM) method outperforms traditional approaches in longitudinal multi-omics analyses. CAT Bridge simplifies access to various established methods for longitudinal multi-omics analysis, and enables researchers to swiftly identify associated gene-metabolite pairs for further validation.

Article activity feed

  1. Conclusions We applied CAT Bridge to experimentally obtained Capsicum chinense (chili pepper) and public human and Escherichia coli (E. coli) time-series transcriptome and metabolome datasets. CAT Bridge successfully identified genes involved in the biosynthesis of capsaicin in C. chinense. Furthermore, case study results showed that the convergent cross mapping (CCM) method outperforms traditional approaches in longitudinal multi-omics analyses. CAT Bridge simplifies access to various established methods for longitudinal multi-omics analysis, and enables researchers to swiftly identify associated gene-metabolite pairs for further validation.

    Reviewer2: JITENDRA KUMAR Barupal Reviewer Comments: To the authors,Thank you for the opportunity to review the manuscript GIGA-D-24-00083. The authors created a tool to predict association between genes and metabolites using various algorithms. The authors provide the tool as a web application, and as a python package. To get the reciprocal relationship between gene and metabolites, i.e. which metabolites can change which gene or vice versa, this tool can be a toolkit for the biologist or bioinformatician.The tool has application specially the relationship between changes in genes and metabolites is not direct, many complex mechanisms exist e.g. epigenetic or polymorphism. So the tool can be alternate to other available tools.Also, the manuscript brings the community focus on causal relationships instead of just correlation based relationships. The tool used temporal causality algorithms for predicting relationships between genes and metabolites.However, I recommend major revisions before publication. Here are my reasons and comments for the revisions:General issues with web accessibility and package installation :1. There are concerns about web accessibility, as indicated by web browsers flagging the connection as insecure. This may stem from geographical restrictions or the absence of HTTPS certification. Addressing these issues would ensure secure access to the server.2. Despite successful initiation of the client application from the git repository as a python module, no results were generated upon launching. It is suggested that the authors distribute the tool as a Docker image to facilitate seamless usage, eliminating concerns regarding dependencies and version compatibility.Other comments :1. There are inconsistencies regarding data preprocessing. While the manuscript mentions that the tool will handle preprocessing, it also indicates that users need to provide processed files. Clarification is needed on whether preprocessing is required. It seems, the tool required preprocessed data.2. For clarity use "causality and correlation" instead of "causality/correlation" algorithms.3.Can the tool process any new temporal numerical data series, or does it specifically filter for genes? For instance, if I provide a list of proteins along with a list of genes, will I receive the association between them? It is suggested to include this in the discussion section.4.Does the tool offer the capability to generate a causal diagram or network from these vectors, thereby providing visual support for their assertion regarding the causal relationship between metabolites and genes? If the author is working in this direction, it is suggested that information can be added in the discussion section.5. What definition of causal relationship did the author use, and could they provide a citation for their definition. Predictability or any other criteria were used for causal relationships. Please include the definition or criteria in the introduction and method section.6. What are the minimum or maximum time points (interval) for input files? e.g. will the tool work if I provide only two times points or If I provide 48 times points. Please include the information in the method section.7. What is the influence of the number of time points on the vector relationship presented in the paper? Have any studies by the authors addressed this question? Please include the results and discussion.8. Could the authors clarify which heuristic algorithm was employed for ranking the genes? Additionally, can they elaborate on how their approach to gene ranking is heuristic rather than relying on mathematical optimization or algorithmic methods? Clarification on the term "heuristic" would be beneficial.9. Could the authors offer an example from studies conducted on yeast, E. coli, or other simple organisms, demonstrating how changes in gene sequences have readily been observed to affect metabolite levels? Please include that in the results section.10. Does the tool generate a vector indicating many-to-many relationships or one-to-one relationships? In other words, does it reveal whether one gene is associated with many metabolites, and vice versa, or if it establishes a single genemetabolite relationship? Please include this in the results section. Also, in the discussion section please include examples of application of these relationships in various fields e.g. metabolic engineering or cancer metabolism.11. Table 1 compares the features of CAT Bridge with other available methods. It should encompass features provided by other tools that are not available in the author's tool, such as knowledge-driven integration or integration with a third-party database. Additionally, it should address the limitation posed by the requirement of time series data, which is not just a strength but also a challenge, particularly for epidemiology studies where multiple time series for gene expression may not be feasible.12. Please use alternative phrases to "Self-generated data," such as "experimentally obtained data," to clarify that the author is utilizing data acquired in the lab to validate the tool. (e.g. line 42, 223, and 492).

  2. AbstractBackground With advancements in sequencing and mass spectrometry technologies, multi-omics data can now be easily acquired for understanding complex biological systems. Nevertheless, substantial challenges remain in determining the association between gene-metabolite pairs due to the non-linear and multifactorial interactions within cellular networks. The complexity arises from the interplay of multiple genes and metabolites, often involving feedback loops and time-dependent regulatory mechanisms that are not easily captured by traditional analysis methods.Findings Here, we introduce Compounds And Transcripts Bridge (abbreviated as CAT Bridge, available at https://catbridge.work), a free user-friendly platform for longitudinal multi-omics analysis to efficiently identify transcripts associated with metabolites using time-series omics data. To evaluate the association of gene-metabolite pairs, CAT Bridge is a pioneering work benchmarking a set of statistical methods spanning causality estimation and correlation coefficient calculation for multi-omics analysis. Additionally, CAT Bridge features an artificial intelligence (AI) agent to assist users interpreting the association results.

    Reviewer 1: Tara Eicher Reviewer Comments: The authors introduce a useful tool (CAT Bridge) for integrating multiple causal and correlative analyses for multi-omics integration, which also includes a visualization and LLM component. The authors further provide two case studies (human and plant) illustrating the utility of CAT Bridge. I believe that this work should be published, as it contributes to the field of multi-omics analysis.However, I am very concerned about the lack of description regarding the LLM. As explained by Mittelstadt et al (https://www.nature.com/articles/s41562-023-01744-0), LLMs do not always provide factual answers. The authors need to justify the use of the LLM to determine the relevance of a gene-metabolite association. In particular, the authors should add to the main text (or at least the supplementary) a detailed description of the prompt construction and should justify why this prompt is expected to result in factual information. Furthermore, the authors should discuss the caveats of using LLMs in this context, starting with the linked article above. I believe that the manuscript will only be publishable once this concern is addressed.In addition, the authors are recommended to address the following more minor concerns:Implementation:1. Your "example file" links at https://catbridge.work are broken. Please fix this.Abstract:1. Line 32: "Nevertheless, substantial challenges remain in determining the association between gene-metabolite pairs due to the complexity of cellular networks." This is not a clear statement. What about the complexity of cellular networks presents challenges in determining the associations?2. Make sure you are using present tense consistently, not past tense (Line 39).3. Please use the scientific name with the common name in parentheses as follows: Capsicum chinense (chili pepper). Use only the scientific name throughout the rest of the document (Line 41).Background:1. Line 56: "Background" should not be plural.2. Lines 59-60: More comprehensive than what? Please elaborate here.3. In Line 60, please include and familiarize yourself with the following reference: Eicher, T., G. Kinnebrew, A. Patt, K. Spencer, K. Ying, Q. Ma, R. Machiraju and E. A. Mathé (2020). "Metabolomics and Multi-Omics Integration: A Survey of Computational Methods and Resources." Metabolites 10: 202.4. Lines 67-68: Citation needed.5. Line 72: Please use the scientific name with the common name in parentheses.6. Lines 74-77: Citations needed.7. Lines 77-78: Give an example of biologically naïve conclusions from purely data-driven strategies.8. Line 78: Discuss how the machine learning models could address the drawbacks of the correlation models and vice-versa.Materials and Methods:1. It seems that CAT Bridge needs to be run on one metabolite at a time. In this case, I would not use the term "gene-metabolite pair association" in Line 104, but rather "associations between genes and the target metabolite".2. Line 115: Clearly state which of these methods are non-linear and which address the lag issue.3. Line 136: Your figures are out of order (Figure 1B comes after Figure 2B).4. Please take a look at the Minimum Standards Reporting Checklist (https://academic.oup.com/gigascience/pages/Minimum_Standards_of_Reporting_Checklist). In particular:a. In the section starting at Line 153, list the number of seedlings used.b. Were all timepoints collected from all seedlings? List the total number of samples.c. How many mg were collected per sample (can use a range here)?d. 3 biological replicates per seedling? Give more detail here.e. What machine was used for the ultrasonic process? If frequency settings are permitted by the machine, list the settings used.f. How many of the 28 younger and 54 older adults had both transcriptome and metabolome data?5. Line 209: "Younger" and "older" are better terms.Results:1. Line 248: How does the AI agent analyze the functional annotations?2. Lines 281-282: "This illustrates the advantage of causal relationship modeling methods over traditional methods".3. Line 290: Please also include the updated IntLIM paper (IntLIM 2.0): Eicher, T., K. D. Spencer, J. K. Siddiqui, R. Machiraju and E. A. Mathe (2023). "IntLIM 2.0: identifying multi-omic relationships dependent on discrete or continuous phenotypic measurements." Bioinformatics Advances 3(1): vbad009.4. Make sure the colors are consistent in Table 1.5. Line 156: The scientific name of the pepper species is inconsistent with other areas of the text.Figures:1. S1 should be provided as a table, not a figure.2. Please make S2 larger. It is difficult to read.3. S3 needs labels (x axis, y axis, legend).