Evaluating the accuracy and consistency of ChatGPT for the management of type 2 diabetes: A cross-sectional study

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large language models (LLMs) have fundamentally changed how patients and clinicians retrieve information; however, it is unclear how accurate and consistent widely available LLMs are in answering questions related to medical information. Our objective was to evaluate the accuracy and consistency of ChatGPT in answering questions related to the management of type 2 diabetes mellitus (T2DM). Three users asked ChatGPT 13 questions pertaining to medications from the top five most common classes of T2DM medications. A response was labelled inconsistent if the response provided to one user differed from the response provided to at least one other user in the same domain for the same medication. A response was labelled as inaccurate if the information provided by ChatGPT was incorrect based on the most recent FDA-approved drug label, in addition to review by an expert reviewer. Additionally, one user asked ChatGPT 26 basic questions related to the management of T2DM, in which the answer was categorized as correct or incorrect. We summarized all results using descriptive statistics. ChatGPT delivered inaccurate responses in seven out of 13 domains and inconsistent responses in seven out of 13 domains for drugs in all five classes of T2DM medication. Of ChatGPT’s responses to the 26 basic T2DM treatment questions, 7 (26%) were incorrect. In this cross-sectional study, we identified that it was common for ChatGPT to provide incorrect or inconsistent responses to enquiries related to the management of type 2 diabetes.

Article activity feed