From Answers to Insights: Unveiling the Strengths and Limitations of ChatGPT and Biomedical Knowledge Graphs

Yu Hou
Jeremy Yeung
Hua Xu
Chang Su
Fei Wang
Rui Zhang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

AI (mark2d2)

Abstract

Purpose: Large Language Models (LLMs) have shown exceptional performance in various natural language processing tasks, benefiting from their language generation capabilities and ability to acquire knowledge from unstructured text. However, in the biomedical domain, LLMs face limitations that lead to inaccurate and inconsistent answers. Knowledge Graphs (KGs) have emerged as valuable resources for organizing structured information. Biomedical Knowledge Graphs (BKGs) have gained significant attention for managing diverse and large-scale biomedical knowledge. The objective of this study is to assess and compare the capabilities of ChatGPT and existing BKGs in question-answering, biomedical knowledge discovery, and reasoning tasks within the biomedical domain. Methods: We conducted a series of experiments to assess the performance of ChatGPT and the BKGs in various aspects of querying existing biomedical knowledge, knowledge discovery, and knowledge reasoning. Firstly, we tasked ChatGPT with answering questions sourced from the "Alternative Medicine" sub-category of Yahoo! Answers and recorded the responses. Additionally, we queried BKG to retrieve the relevant knowledge records corresponding to the questions and assessed them manually. In another experiment, we formulated a prediction scenario to assess ChatGPT's ability to suggest potential drug/dietary supplement repurposing candidates. Simultaneously, we utilized BKG to perform link prediction for the same task. The outcomes of ChatGPT and BKG were compared and analyzed. Furthermore, we evaluated ChatGPT and BKG's capabilities in establishing associations between pairs of proposed entities. This evaluation aimed to assess their reasoning abilities and the extent to which they can infer connections within the knowledge domain. Results: The results indicate that ChatGPT with GPT-4.0 outperforms both GPT-3.5 and BKGs in providing existing information. However, BKGs demonstrate higher reliability in terms of information accuracy. ChatGPT exhibits limitations in performing novel discoveries and reasoning, particularly in establishing structured links between entities compared to BKGs. Conclusions: To address the limitations observed, future research should focus on integrating LLMs and BKGs to leverage the strengths of both approaches. Such integration would optimize task performance and mitigate potential risks, leading to advancements in knowledge within the biomedical field and contributing to the overall well-being of individuals.

Version published to 10.21203/rs.3.rs-3185632/v1 on Research Square
Aug 1, 2023

Beyond Identifier Matching: An Empirical Characterization of Failure Modes in Biomedical Knowledge Graph Integration

This article has 11 authors:
1. Shiyue Hu
2. He Cheng
3. Lucas Gillenwater
4. Keenan Manpearl
5. Aishwarya Mandava
6. Yifan Wang
7. Milton Pividori
8. Barbara Stranger
9. Arjun Krishnan
10. Casey S. Greene
11. Yanjun Gao
This article has no evaluationsLatest version May 28, 2026
Agentic Authoring of OMOP Concept Sets from Natural Language

This article has 6 authors:
1. Hongyu Chen
2. Xing He
3. Hao Dai
4. Yu Huang
5. Mei Liu
6. Jiang Bian
This article has no evaluationsLatest version Jun 3, 2026
Genosolver: Rare Disease Diagnosis through Holistic Integration of Unstructured Clinical Narratives Using Large Language and Reasoning Models

This article has 27 authors:
1. Tanhim Islam
2. Martin Danner
3. Zain Ziad
4. Matthias Begemann
5. Danique Beijer
6. Annette Lischka
7. Eva Lausberg
8. Larissa Mattern
9. Julia Suh
10. Pauline Wittig
11. Nergis Güzel
12. Elia Schlaich
13. Radina Karaivanova
14. Sofia D’Augello
15. Lena Franken
16. Jarik Rüdebusch
17. Robin Müller
18. Eva Perchalla
19. Hans Zempel
20. Natja Haag
21. Katja Eggermann
22. Thomas Eggermann
23. Robert Meyer
24. Florian Kraft
25. Miriam Elbracht
26. Ingo Kurth
27. Jeremias Krause
This article has no evaluationsLatest version Jun 5, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Beyond Identifier Matching: An Empirical Characterization of Failure Modes in Biomedical Knowledge Graph Integration

Agentic Authoring of OMOP Concept Sets from Natural Language

Genosolver: Rare Disease Diagnosis through Holistic Integration of Unstructured Clinical Narratives Using Large Language and Reasoning Models