BioFunctional: A Comprehensive App for Interpreting and Visualizing Functional Analysis of KEGG Pathways and Gene Ontologies

Alejandro Rodriguez
Antonio Monleon Getino

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (Arcadia Science)

Abstract

A comprehensive application designed for the interpretation and visualization of the functional analysis related to KEGG pathways and gene ontologies gives researchers and specialists a tool to get detailed functional information about their data, specifically going deep into biological pathways and gene functions information. By using a variety of techniques and libraries, such as Shiny, htrr, dplyr, tibble, and rvest, we have developed an application that provides a well-designed user-oriented interface with all the facilities to assess their data and start analyzing it directly from scratch through a few steps.

The software allows an exhaustive exploration of KEGG pathways and Gene Ontologies, facilitating the analysis of complex biological processes. To achieve this, functions described in the scripts integrate data manipulation methods and web scraping techniques to extract the necessary information from online official databases, Kyoto Encyclopedia of Genes and Genomes (KEGG) and QuickGo. Furthermore, those functions are computed by parallel processing, resulting in efficient petitions to the database servers and allowing the user to get quick results from a large dataset.

A crucial feature of BioFunctional is its ability to obtain ancestral information for KEGG pathways and gene ontologies, using the techniques described above. This makes it easier to understand the hierarchy of these ontologies and how each sample in a dataset is classified within them, offering users a way to study the dataset at different taxonomic levels directly from the raw data. Additionally, the app implements the capability to create interactive networks, representing all experimental data to visualize the relationships between groups and ontologies without neglecting the established classification. This is a primary tool for understanding the meaning of the relationships observed within the displayed system.

Key features of Biofunctional include

Interactive visualization: Create and explore networks to visualize relationships between groups and ontologies.
Hierarchical analysis: Trace ancestral information for KEGG pathways and Gene Ontologies to understand hierarchical classifications.
Efficient processing: Employ parallel processing for rapid data analysis, even with large datasets.
Intuitive interface: A user-friendly design simplifies data exploration and analysis.

Due to these attributes, the software represents a valuable tool for analysts involved in the study of KEGG pathways and Gene Ontologies. By providing an intuitive interface with advanced data processing techniques, it empowers researchers to unravel the intricacies of biological functions and gain insights into the relationships between genes or molecular components.

Supplementary information

https://github.com/alexrodriguezmena/BIOFunctional

Arcadia Science
Nov 1, 2024

In conclusion, high-throughput biological data

I appreciate that you have made a publicly available tool that is meant to aid in the interpretation of large datasets. GO and KEGG mappings are widely used by the biological community and I agree that trying to interpret them on a large scale is challenging. I was able to download and run your app, although I was not able to get the heatmap visualization to work. Overall, my main comment on this paper is that there seem to be a lot of extraneous/unnecessary broad descriptions, but there is a strong lack of clarity around how to effectively use the app and what the actual contributions of the app are. I think this paper would benefit from removing these unnecessary details and focusing more on what was built. The github README would strongly benefit from a more step by step user tutorial …

In conclusion, high-throughput biological data

I appreciate that you have made a publicly available tool that is meant to aid in the interpretation of large datasets. GO and KEGG mappings are widely used by the biological community and I agree that trying to interpret them on a large scale is challenging. I was able to download and run your app, although I was not able to get the heatmap visualization to work. Overall, my main comment on this paper is that there seem to be a lot of extraneous/unnecessary broad descriptions, but there is a strong lack of clarity around how to effectively use the app and what the actual contributions of the app are. I think this paper would benefit from removing these unnecessary details and focusing more on what was built. The github README would strongly benefit from a more step by step user tutorial if you want others to use your tool.

Read the original source
Arcadia Science
Nov 1, 2024

Moreover, the AI study shows that official conversational artificial intelligence created by different developers can output useful conclusions of what is happening in the case studied, all by submitting the prompt created input that gives AI the information needed to give a non-expert user the keys to deeply understand what happens in his data.

I am still not sure how this is involved at all. I saw the AI prompt file but I don't see any discussion of outputs that you got back from it that were informative? Also I do not understand the relevance to the tool you have built

Read the original source
Arcadia Science
Nov 1, 2024

The software is completely able to retrieve information from online databases, like Kyoto Encyclopedia of Genes and Genomes (KEGG) and QuickGo by using the methods explained previously, being also able to construct usable and optimal dataframes for further processes from the data obtained.

I think a major component missing here is a clear description of the workflow you are proposing. Do users first perform GO and KEGG mapping of their genes of interest and then use your software to pull additional layers of information? I looked at your sample files and tried running the app, and it seems that for GO, it just retrieves the first ancestor term. If this is the case, it would be much simpler to just say "our software can be used to pull the first ancestor terms for a list of go terms of interest. The user can then view this first …

The software is completely able to retrieve information from online databases, like Kyoto Encyclopedia of Genes and Genomes (KEGG) and QuickGo by using the methods explained previously, being also able to construct usable and optimal dataframes for further processes from the data obtained.

I think a major component missing here is a clear description of the workflow you are proposing. Do users first perform GO and KEGG mapping of their genes of interest and then use your software to pull additional layers of information? I looked at your sample files and tried running the app, and it seems that for GO, it just retrieves the first ancestor term. If this is the case, it would be much simpler to just say "our software can be used to pull the first ancestor terms for a list of go terms of interest. The user can then view this first ancestor information overlaid on the network visualization" or something like this

Read the original source
Arcadia Science
Nov 1, 2024

4.5 AI prompt Interpreter

It is very unclear why this information is included here

Read the original source
Arcadia Science
Nov 1, 2024

Figure 5.

I'm not sure that figures 1 through 5 are needed or add much to the paper. It might be more useful to show more of the outputs of the app, or walk users through a specific use case that demonstrates how the app can be used to explore a biological question.

Read the original source
Arcadia Science
Nov 1, 2024

3.2 Data Pre-Processing

It is really unclear what the point of this section is or how it relates to the application you have produced. Are you describing pre-filtering steps that you have done on your own data? Or providing instructions on steps that users should take? In either case, I do not think you need sections to explain differential expression analysis or gene set enrichment analysis, as these are well-established techniques. It would be a lot more helpful to provide a clear description of what the use cases are for this application, what the starting point is for using your application, and how the data should be formatted. As a possible example: "Users will upload a csv file to the application that contains results of differential expression analysis. Users will need to provide GO or KEGG mappings for each gene in their …

3.2 Data Pre-Processing

It is really unclear what the point of this section is or how it relates to the application you have produced. Are you describing pre-filtering steps that you have done on your own data? Or providing instructions on steps that users should take? In either case, I do not think you need sections to explain differential expression analysis or gene set enrichment analysis, as these are well-established techniques. It would be a lot more helpful to provide a clear description of what the use cases are for this application, what the starting point is for using your application, and how the data should be formatted. As a possible example: "Users will upload a csv file to the application that contains results of differential expression analysis. Users will need to provide GO or KEGG mappings for each gene in their analysis and information on experimental groups that they would like to compare." I downloaded and ran your application, but it is still unclear to me how this section relates to your app.

Read the original source
Arcadia Science
Nov 1, 2024

Also, a new category was added in the filter feature of the software, and it was the possibility of adding an extra column in the table named “Disease” (defined by our dataset sample case), where the diseases name would be extracted from the raw data and stored for further analysis, only useful when your file stores experimental data of more than one disease, basically it request the input column to inspect and the names of the different diseases.

I'm not sure I understand this section/the utility of what you are describing here for the disease column. It might be helpful to revisit this section and try to reword it for clarity.

Read the original source
Arcadia Science
Nov 1, 2024

[11].

The order of references seems odd here. It would be easier to follow if the references were numbered in the order they appear.

Read the original source
Arcadia Science
Nov 1, 2024

https://github.com/alexrodriguezmena/BIOFunctional

Thank you for making your code available on github! You might consider adding a license so that others understand how they can use your code. For reproducibility, it might also be helpful to make a release of the current version of the scripts.

Read the original source
Arcadia Science
Nov 1, 2024

The aim of this project is to develop an user-friendly application to

I love this idea! I've also spent a lot of time figuring out the best way to visualize this information for my data. Tools that make this easier are definitely needed.

Read the original source
Version published to 10.1101/2024.10.08.616405v1 on bioRxiv
Oct 12, 2024

PMScanR: an R package for the large-scale identification, analysis, and visualization of protein motifs

This article has 5 authors:
1. Jan Pawel Jastrzebski
2. Monika Gawronska
3. Wiktor Babis
4. Miriana Quaranta
5. Damian Czopek
This article has no evaluationsLatest version May 27, 2025
MetaNet: a scalable and integrated tool for reproducible omics network analysis

This article has 10 authors:
1. Chen Peng
2. Zinuo Huang
3. Xin Wei
4. Liuyiqi Jiang
5. Xiaoping Zhu
6. Zhen Liu
7. Qiong Chen
8. Xiaotao Shen
9. Peng Gao
10. Chao Jiang
This article has no evaluationsLatest version Jun 29, 2025
DeepPGDB: A Novel Paradigm for AI-Guided Interactive Plant Genomic Database

This article has 13 authors:
1. Fangping Li
2. Jiaxuan Chen
3. Wei Luo
4. Jieying Liu
5. Guodong Chen
6. Bingyu Shuai
7. Zhuangwei Hou
8. Zhenpeng Gan
9. Penglin Zhan
10. Changwei Bi
11. Zefu Wang
12. Haifei Hu
13. Shaokui Wang
This article has no evaluationsLatest version Jun 3, 2025

This article has been Reviewed by the following groups

Listed in

Abstract

Key features of Biofunctional include

Supplementary information

Article activity feed

Related articles

PMScanR: an R package for the large-scale identification, analysis, and visualization of protein motifs

MetaNet: a scalable and integrated tool for reproducible omics network analysis

DeepPGDB: A Novel Paradigm for AI-Guided Interactive Plant Genomic Database