Knowledge graph–based thought: a knowledge graph–enhanced LLM framework for pan-cancer question answering

Yichun Feng
Lu Zhou
Chao Ma
Yikai Zheng
Ruikun He
Yixue Li

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (GigaScience)

Abstract

Background

In recent years, large language models (LLMs) have shown promise in various domains, notably in biomedical sciences. However, their real-world application is often limited by issues like erroneous outputs and hallucinatory responses.

Results

We developed the knowledge graph–based thought (KGT) framework, an innovative solution that integrates LLMs with knowledge graphs (KGs) to improve their initial responses by utilizing verifiable information from KGs, thus significantly reducing factual errors in reasoning. The KGT framework demonstrates strong adaptability and performs well across various open-source LLMs. Notably, KGT can facilitate the discovery of new uses for existing drugs through potential drug–cancer associations and can assist in predicting resistance by analyzing relevant biomarkers and genetic mechanisms. To evaluate the knowledge graph question answering task within biomedicine, we utilize a pan-cancer knowledge graph to develop a pan-cancer question answering benchmark, named pan-cancer question answering.

Conclusions

The KGT framework substantially improves the accuracy and utility of LLMs in the biomedical field. This study serves as a proof of concept, demonstrating its exceptional performance in biomedical question answering.

GigaScience
Feb 4, 2025

Background In recent years, Large Language Models (LLMs) have shown promise in various domains, notably in biomedical sciences. However, their real-world application is often limited by issues like erroneous outputs and hallucinatory responses.Results We developed the Knowledge Graph-based Thought (KGT) framework, an innovative solution that integrates LLMs with Knowledge Graphs (KGs) to improve their initial responses by utilizing verifiable information from KGs, thus significantly reducing factual errors in reasoning. The KGT framework demonstrates strong adaptability and performs well across various open-source LLMs. Notably, KGT can facilitate the discovery of new uses for existing drugs through potential drug-cancer associations, and can assist in predicting resistance by analyzing relevant biomarkers and genetic mechanisms. To …

Background In recent years, Large Language Models (LLMs) have shown promise in various domains, notably in biomedical sciences. However, their real-world application is often limited by issues like erroneous outputs and hallucinatory responses.Results We developed the Knowledge Graph-based Thought (KGT) framework, an innovative solution that integrates LLMs with Knowledge Graphs (KGs) to improve their initial responses by utilizing verifiable information from KGs, thus significantly reducing factual errors in reasoning. The KGT framework demonstrates strong adaptability and performs well across various open-source LLMs. Notably, KGT can facilitate the discovery of new uses for existing drugs through potential drug-cancer associations, and can assist in predicting resistance by analyzing relevant biomarkers and genetic mechanisms. To evaluate the Knowledge Graph Question Answering (KGQA) task within biomedicine, we utilize a pan-cancer knowledge graph to develop a pan-cancer question answering benchmark, named the Pan-cancer Question Answering (PcQA).Conclusions The KGT framework substantially improves the accuracy and utility of LLMs in the biomedical field. This study serves as a proof-of-concept, demonstrating its exceptional performance in biomedical question answering

This work has been peer reviewed in GigaScience (see , https://doi.org/10.1093/gigascience/giae082), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

Reviewer: Cody Bumgardner

We are just beginning to get a glimpse into the ways that large language models (LLMs) might advance biomedical informatics. The framework you have described I would consider a serious contribution to the state-of-the-art in the area of bridging LLMs and structured data. The use of LLMs for code generation and interpretation within the same request is also innovative. The application of your framework to MeSH (https://www.nlm.nih.gov/mesh/meshhome.html) and other broader linked ontologies would be very interesting. You might also consider integrating tool calling as well (which in a way you are with subgraphs), to either further reduce the demential space or accessing data that does not otherwise have a graph structure. In this case, the content of your subgraph nodes might be the result of a function call. Congratulations on your work, it is a real contribution to our community.

Read the original source
GigaScience
Feb 4, 2025
Background In recent years, Large Language Models (LLMs) have shown promise in various domains, notably in biomedical sciences. However, their real-world application is often limited by issues like erroneous outputs and hallucinatory responses.Results We developed the Knowledge Graph-based Thought (KGT) framework, an innovative solution that integrates LLMs with Knowledge Graphs (KGs) to improve their initial responses by utilizing verifiable information from KGs, thus significantly reducing factual errors in reasoning. The KGT framework demonstrates strong adaptability and performs well across various open-source LLMs. Notably, KGT can facilitate the discovery of new uses for existing drugs through potential drug-cancer associations, and can assist in predicting resistance by analyzing relevant biomarkers and genetic mechanisms. To …
Background In recent years, Large Language Models (LLMs) have shown promise in various domains, notably in biomedical sciences. However, their real-world application is often limited by issues like erroneous outputs and hallucinatory responses.Results We developed the Knowledge Graph-based Thought (KGT) framework, an innovative solution that integrates LLMs with Knowledge Graphs (KGs) to improve their initial responses by utilizing verifiable information from KGs, thus significantly reducing factual errors in reasoning. The KGT framework demonstrates strong adaptability and performs well across various open-source LLMs. Notably, KGT can facilitate the discovery of new uses for existing drugs through potential drug-cancer associations, and can assist in predicting resistance by analyzing relevant biomarkers and genetic mechanisms. To evaluate the Knowledge Graph Question Answering (KGQA) task within biomedicine, we utilize a pan-cancer knowledge graph to develop a pan-cancer question answering benchmark, named the Pan-cancer Question Answering (PcQA).Conclusions The KGT framework substantially improves the accuracy and utility of LLMs in the biomedical field. This study serves as a proof-of-concept, demonstrating its exceptional performance in biomedical question answering.

This work has been peer reviewed in GigaScience (see , https://doi.org/10.1093/gigascience/giae082), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

Reviewer: Linhao Luo

Summary: This paper proposes a novel framework called KGT that integrates Large Language Models (LLMs) with Knowledge Graphs (KGs) for pan-cancer question answering. The KGT framework can effectively retrieve knowledge from KGs and improve the accuracy of LLMs for question answering. Moreover, it can provide interpretable and faithful explanations with the help of structured KGs. Comments:

This paper construct a new dataset denoted as PcQA form a customized KG called SOKG for the evaluation of pan-cancer question answering. This is a great contribution to the community. However, it is unclear how to constuct such a dataset. More details about the construnction process and statistics of the final datasets should be disscussed in the paper. For example, how to generate the natural language questions and answers? How to link the question with relatived KG information (i.e., entity and relation)? How many questions can be answered by the KGs (i.e., answer converage rate). How many questions have been generated? What is the ratio of each quetion types defined in Table 2?

In Table2, the author define 4 reasoning types. How about other reasoning types such union and negation? Can we incorpate these tpes into the datasets?

The propsed method is novel and interesting. However some details are unclear. In the candidate path search, do we want to search reasoning paths or relational chains? The definition of these two paths are also unclear. Please give clear definition of them in prelimary. If is the reasoning paths, do we only keep the type information during the BFS?

I do not understand why we need to generatea cypher query to retrieve subgraph then construct relation paths from KG. We can directly retrieved relational paths from KGs by BFS. What are the benefits and motivations of using this two-stage pipeline?

What are the meanings of the X and âˆš in the figure. How to get them?

In experiments, other advanced KGQA methods can be compared, e.g., RoG [1] and ToG [2].

The analysis of used token, time, and cost should be disscussed in the paper.

Can we apply the proposed metod to other KGs (i.e., SynLethKG, and SDKG) or KGQA tasks (MetaQA, and FACTKG) to show the generability. [1] LUO, L., Li, Y. F., Haf, R., & Pan, S. Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning. In The Twelfth International Conference on Learning Representations. [2] Sun, J., Xu, C., Tang, L., Wang, S., Lin, C., Gong, Y., ... & Guo, J. (2023). Think-on-graph: Deep and responsible reasoning of large language model with knowledge graph. arXiv preprint arXiv:2307.07697
Read the original source
Version published to 10.1093/gigascience/giae082
Jan 1, 2025
Version published to 10.1101/2024.04.17.589873 on bioRxiv
Apr 20, 2024

RAG-KED Summarization: A Framework for Knowledge-Augmented Article Summarization with Large Language Models

This article has 2 authors:
1. Abdulrehman Mohsen Ahmed Zeyad
2. Arun Biradar
This article has no evaluationsLatest version Oct 30, 2025
Entity-centric evaluation of large language model responses for medical question-answering tasks

This article has 2 authors:
1. Yi Liu
2. Vijaya B. Kolachalama
This article has no evaluationsLatest version Nov 14, 2025
ECLIPSE: Exploration of Complex Ligand-Protein Interactions through Learning from Systems-level Heterogeneous Biomedical Knowledge Graphs

This article has 2 authors:
1. Heval Atas Guvenilir
2. Tunca Doğan
This article has no evaluationsLatest version Nov 7, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Background

Results

Conclusions

Article activity feed

Related articles

RAG-KED Summarization: A Framework for Knowledge-Augmented Article Summarization with Large Language Models

Entity-centric evaluation of large language model responses for medical question-answering tasks

ECLIPSE: Exploration of Complex Ligand-Protein Interactions through Learning from Systems-level Heterogeneous Biomedical Knowledge Graphs