BioFunctional: A Comprehensive App for Interpreting and Visualizing Functional Analysis of KEGG Pathways and Gene Ontologies

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

A comprehensive application designed for the interpretation and visualization of the functional analysis related to KEGG pathways and gene ontologies gives researchers and specialists a tool to get detailed functional information about their data, specifically going deep into biological pathways and gene functions information. By using a variety of techniques and libraries, such as Shiny, htrr, dplyr, tibble, and rvest, we have developed an application that provides a well-designed user-oriented interface with all the facilities to assess their data and start analyzing it directly from scratch through a few steps.

The software allows an exhaustive exploration of KEGG pathways and Gene Ontologies, facilitating the analysis of complex biological processes. To achieve this, functions described in the scripts integrate data manipulation methods and web scraping techniques to extract the necessary information from online official databases, Kyoto Encyclopedia of Genes and Genomes (KEGG) and QuickGo. Furthermore, those functions are computed by parallel processing, resulting in efficient petitions to the database servers and allowing the user to get quick results from a large dataset.

A crucial feature of BioFunctional is its ability to obtain ancestral information for KEGG pathways and gene ontologies, using the techniques described above. This makes it easier to understand the hierarchy of these ontologies and how each sample in a dataset is classified within them, offering users a way to study the dataset at different taxonomic levels directly from the raw data. Additionally, the app implements the capability to create interactive networks, representing all experimental data to visualize the relationships between groups and ontologies without neglecting the established classification. This is a primary tool for understanding the meaning of the relationships observed within the displayed system.

Key features of Biofunctional include

  • Interactive visualization: Create and explore networks to visualize relationships between groups and ontologies.

  • Hierarchical analysis: Trace ancestral information for KEGG pathways and Gene Ontologies to understand hierarchical classifications.

  • Efficient processing: Employ parallel processing for rapid data analysis, even with large datasets.

  • Intuitive interface: A user-friendly design simplifies data exploration and analysis.

Due to these attributes, the software represents a valuable tool for analysts involved in the study of KEGG pathways and Gene Ontologies. By providing an intuitive interface with advanced data processing techniques, it empowers researchers to unravel the intricacies of biological functions and gain insights into the relationships between genes or molecular components.

Supplementary information

https://github.com/alexrodriguezmena/BIOFunctional

Article activity feed

  1. In conclusion, high-throughput biological data

    I appreciate that you have made a publicly available tool that is meant to aid in the interpretation of large datasets. GO and KEGG mappings are widely used by the biological community and I agree that trying to interpret them on a large scale is challenging. I was able to download and run your app, although I was not able to get the heatmap visualization to work. Overall, my main comment on this paper is that there seem to be a lot of extraneous/unnecessary broad descriptions, but there is a strong lack of clarity around how to effectively use the app and what the actual contributions of the app are. I think this paper would benefit from removing these unnecessary details and focusing more on what was built. The github README would strongly benefit from a more step by step user tutorial if you want others to use your tool.

  2. Moreover, the AI study shows that official conversational artificial intelligence created by different developers can output useful conclusions of what is happening in the case studied, all by submitting the prompt created input that gives AI the information needed to give a non-expert user the keys to deeply understand what happens in his data.

    I am still not sure how this is involved at all. I saw the AI prompt file but I don't see any discussion of outputs that you got back from it that were informative? Also I do not understand the relevance to the tool you have built

  3. The software is completely able to retrieve information from online databases, like Kyoto Encyclopedia of Genes and Genomes (KEGG) and QuickGo by using the methods explained previously, being also able to construct usable and optimal dataframes for further processes from the data obtained.

    I think a major component missing here is a clear description of the workflow you are proposing. Do users first perform GO and KEGG mapping of their genes of interest and then use your software to pull additional layers of information? I looked at your sample files and tried running the app, and it seems that for GO, it just retrieves the first ancestor term. If this is the case, it would be much simpler to just say "our software can be used to pull the first ancestor terms for a list of go terms of interest. The user can then view this first ancestor information overlaid on the network visualization" or something like this

  4. Figure 5.

    I'm not sure that figures 1 through 5 are needed or add much to the paper. It might be more useful to show more of the outputs of the app, or walk users through a specific use case that demonstrates how the app can be used to explore a biological question.

  5. 3.2 Data Pre-Processing

    It is really unclear what the point of this section is or how it relates to the application you have produced. Are you describing pre-filtering steps that you have done on your own data? Or providing instructions on steps that users should take? In either case, I do not think you need sections to explain differential expression analysis or gene set enrichment analysis, as these are well-established techniques. It would be a lot more helpful to provide a clear description of what the use cases are for this application, what the starting point is for using your application, and how the data should be formatted. As a possible example: "Users will upload a csv file to the application that contains results of differential expression analysis. Users will need to provide GO or KEGG mappings for each gene in their analysis and information on experimental groups that they would like to compare." I downloaded and ran your application, but it is still unclear to me how this section relates to your app.

  6. Also, a new category was added in the filter feature of the software, and it was the possibility of adding an extra column in the table named “Disease” (defined by our dataset sample case), where the diseases name would be extracted from the raw data and stored for further analysis, only useful when your file stores experimental data of more than one disease, basically it request the input column to inspect and the names of the different diseases.

    I'm not sure I understand this section/the utility of what you are describing here for the disease column. It might be helpful to revisit this section and try to reword it for clarity.

  7. The aim of this project is to develop an user-friendly application to

    I love this idea! I've also spent a lot of time figuring out the best way to visualize this information for my data. Tools that make this easier are definitely needed.