Evaluating ChatGPT for Disease Prediction: A Comparative Study on Heart Disease and Diabetes

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Chronic diseases place a significant burden on healthcare systems due to the need for long-term treatment. Early diagnosis is critical for effective management and minimizing risk. The current traditional diagnostic approaches face various challenges in terms of efficiency and cost. Digitized healthcare demonstrates several opportunities for reducing human errors, increasing clinical outcomes, tracing data, etc. Artificial Intelligence (AI) has emerged as a transformative tool in healthcare. Subsequently, the evolution of Generative AI represents a new wave. Large Language Models (LLMs), such as ChatGPT, are promising tools for enhancing diagnostic processes, but their potential in this domain remains underexplored. This study represents the first systematic evaluation of ChatGPT's performance in chronic disease prediction, specifically targeting heart disease and diabetes. This study aims to compare the effectiveness of zero-shot, few-shot, and CoT reasoning with feature selection techniques and prompt formulations in disease prediction tasks. Two latest versions of GPT4 (GPT-4o and GPT-4o-mini) are tested. Then, the results are evaluated against the best models from literature. The results indicate that GPT-4o significantly beat GPT-4o-mini in all scenarios in terms of accuracy, precision and F1-score. Besides, using a 5-shot learning strategy demonstrates superior performance compared to zero-shot, few-shot (3-shot, 5-shot, 10-shot), zero-shot CoT reasoning, 3-shot CoT reasoning, and the proposed Knowledge-enhanced CoT. It achieved an accuracy of 77.07% in diabetes prediction using Pima Indian Diabetes Dataset and 75.85% using Frankfurt Hospital Diabetes Dataset as well as 83.65% in heart disease prediction. Subsequently, refining prompt formulations resulted in notable improvements, particularly for the heart dataset, emphasizing the importance of prompt engineering. The clarification of column names and categorical values contributed to a 5% performance increase when using GPT-4o. Besides, the proposed Knowledge-enhanced 3-shot CoT demonstrated notable improvements in diabetes prediction over CoT, while its effectiveness in heart disease prediction was limited. The reason could be that heart disease is influenced by a more complex combination of features, which indicates the importance of carefully designing reasoning strategies based on the specific characteristics of the disease. Even though ChatGPT does not outperform traditional machine learning and deep learning models, these findings highlight its potential as a complementary tool in disease prediction. Additionally, it demonstrates promising results, particularly with refined prompt designs and feature selection, providing insights for future research to improve the model’s performance.

Article activity feed