Research on the Application of Generative Artificial Intelligence to Evaluate Responses Related to Questions About COVID-19 in Terms of Their Accuracy and Readability

Zongjing Liang
Yun Kuang
Xiaobo Liang
Gongcheng Liang
Zhijie Li

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective: This study aims to compare the accuracy and readability of COVID-19 infectious disease prevention and control knowledge generated by four major generative artificial intelligence models—two international models (ChatGPT and Gemini) and two domestic models (Kimi and Ernie Bot)—to evaluate the performance characteristics of domestic and international models. Methods: The knowledge Q&A from the COVID-19 prevention guidelines issued by the U.S. Centers for Disease Control and Prevention (CDC) was used as the evaluation standard. The texts generated by the four models were compared with the standard in terms of accuracy, readability, and understandability. Then, a neural network model based on intelligent algorithms was used to extract the factors influencing the readability of the generated texts. Finally, text analysis was applied to explore the medical topics in the generated texts. Results: Text accuracy.Domestic models showed higher accuracy in generated texts, while in-ternational models demonstrated better reliability. Text readability.Domestic models produced fluent language and a style suitable for public reading; international models exhibited better stability and tended to generate formal documentation. Text under-standability.Domestic models had better readability; international models had more stable output. Readability influencing factors.The sentence length indicator (AWPS) of texts generated by both domestic and international models was the most important factor affecting readability. Topic analysis: ChatGPT focused more on epidemiological knowledge; Gemini on the healthcare field; Kimi on multidisciplinary information; and Ernie Bot on clinical medical topics. Conclusion: Texts generated by domestic models are easy to understand and more suitable for public reading, and are better suited for clinical testing, health consultation, and similar applications. Texts generated by in-ternational models have higher accuracy and professionalism, focusing more on epidemiological analysis, disease severity assessment, and related fields. Based on the findings, it is recommended that infectious disease prevention knowledge systems—such as those for COVID-19—should pay more attention to the public's knowledge base and comprehension level, achieving an organic integration of professionalism and accessibility in AI-generated knowledge, thereby providing objective reference materials for future major infectious disease outbreaks.

Version published to 10.20944/preprints202505.1319.v1
May 16, 2025

GenAI-Powered Text Personalization: Natural Language Processing Validation of Adaptation Capabilities

This article has 2 authors:
1. Linh Huynh
2. Danielle McNamara
This article has no evaluationsLatest version Apr 29, 2025
Comparative accuracy of ChatGPT-o1, DeepSeek R1, and Gemini 2.0 in answering general primary care questions

This article has 5 authors:
1. Guerino Recinella
2. Chiara Altini
3. Marco Cupardo
4. Iacopo Cricelli
5. Lorenzo Maestri
This article has no evaluationsLatest version Apr 19, 2025
Evaluation of Large Language Models: Review of Metrics, Applications, and Methodologies

This article has 1 author:
1. Satyadhar Joshi
This article has no evaluationsLatest version Apr 7, 2025

Listed in

Abstract

Article activity feed

Related articles

GenAI-Powered Text Personalization: Natural Language Processing Validation of Adaptation Capabilities

Comparative accuracy of ChatGPT-o1, DeepSeek R1, and Gemini 2.0 in answering general primary care questions

Evaluation of Large Language Models: Review of Metrics, Applications, and Methodologies