Evaluating ChatGPT for Disease Prediction: A Comparative Study on Heart Disease and Diabetes

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Chronic diseases significantly burden healthcare systems due to the need for long-term treatment. Early diagnosis is critical for effective management and minimizing risk. The current traditional diagnostic approaches face various challenges regarding efficiency and cost. Digitized healthcare demonstrates several opportunities for reducing human errors, increasing clinical outcomes, tracing data, etc. Artificial Intelligence (AI) has emerged as a transformative tool in healthcare. Subsequently, the evolution of Generative AI represents a new wave. Large Language Models (LLMs), such as ChatGPT, are promising tools for enhancing diagnostic processes, but their potential in this domain remains underexplored. Methods: This study represents the first systematic evaluation of ChatGPT’s performance in chronic disease prediction, specifically targeting heart disease and diabetes. This study compares the effectiveness of zero-shot, few-shot, and CoT reasoning with feature selection techniques and prompt formulations in disease prediction tasks. The two latest versions of GPT4 (GPT-4o and GPT-4o-mini) are tested. Then, the results are evaluated against the best models from the literature. Results: The results indicate that GPT-4o significantly beat GPT-4o-mini in all scenarios regarding accuracy, precision, and F1-score. Moreover, a 5-shot learning strategy demonstrates superior performance to zero-shot, few-shot (3-shot and 10-shot), and various CoT reasoning strategies. The 5-shot learning strategy with GPT-4o achieved an accuracy of 77.07% in diabetes prediction using the Pima Indian Diabetes Dataset, 75.85% using the Frankfurt Hospital Diabetes Dataset, and 83.65% in heart disease prediction. Subsequently, refining prompt formulations resulted in notable improvements, particularly for the heart dataset (5% performance increase using GPT-4o), emphasizing the importance of prompt engineering. Conclusions: Even though ChatGPT does not outperform traditional machine learning and deep learning models, the findings highlight its potential as a complementary tool in disease prediction. Additionally, this work provides value by setting a clear performance baseline for future work on these tasks

Article activity feed