Comparative analysis of Chinese large language model performance on atrial fibrillation questions

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background The first seven Chinese Large language models (LLMs)were launched to the public on August 31st, 2023.However, the extent to which Chinese LLMs can assist atrial fibrillation(AF)patients remains unknown. We sought to assess the Chinese LLMs performance of providing responses to AF patient questions. Method This cross-sectional study compared seven Chinese LLM chatbots including ABAB, Baichuan, Chatglm, Doubao, Ernie bot, Sensechat and ZidongTaichu. At first,cardiologists compiled a list of frequently asked questions by patients with AF. Responses from LLMs were collected. We developed a scoring system known as SCECCE, which consists 6 aspects including s afety, c orrectness, e rror, c ompleteness, c onciseness and e laboration. Each response was assessed by the expert committee with SCFCCE scoring system. Result Ultimately, we obtained 231 responses. On the whole, the median SCFCCE score was 10[IQR,7-10] with a mean(SD) score of 8.6(2.0). No significant statistical differences were observed in the terms of SCFCCE scores among seven LLMs(p=0.08). The full SCFCCE score was 330 points. Ernie bot attained the highest total score of 299 points. Doubao’s responses were safe in 97% of the questions. In terms of correctness and error, the overall comparison of each group revealed no statistically significant difference. Ernie bot exhibited greatest performance with the accuracy rate of 87.9%. Conclusion The findings of our study demonstrated that although Chinese LLMs exhibited strong potential for medical consultation, the review and evaluation by the medical profession is essential.

Article activity feed