A Comparative Study of the Accuracy and Readability of Responses from Four Generative AI Models to COVID-19-Related Questions

Zongjing Liang
Yun Kuang
Xiaobo Liang
Gongcheng Liang
Zhijie Li

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The purpose of this study is to compare the accuracy and readability of Coronavirus Disease 2019 (COVID-19)-prevention and control knowledge texts generated by four current generative artificial intelligence (AI) models—two international models (ChatGPT and Gemini) and two domestic models (Kimi and Ernie Bot)—and to evaluate the other performance characteristics of texts generated by domestic and international models. This paper uses the questions and answers in the COVID-19 prevention guidelines issued by the U.S. Centers for Disease Control and Prevention (CDC) as the evaluation criteria. The accuracy, readability, and comprehensibility of the texts generated by each model are scored against the CDC standards. Then the neural network model in the intelligent algorithms is used to identify the factors that affect readability. Then the medical topics of the generated text are analyzed using text analysis technology. Finally, a questionnaire-based manual scoring approach was used to evaluate the AI-generated texts, which was then compared to automated machine scoring. Accuracy: domestic models have higher textual accuracy, while international models have higher reliability. Readability: domestic models produced more fluent and publicly accessible language; international models generated more standardized and formally structured texts with greater consistency. Comprehensibility: domestic models offered superior readability, while international models were more stable in output. Readability factors: the average words per sentence (AWPS) emerged as the most significant factor influencing readability across all models. Topic analysis: ChatGPT emphasized epidemiological knowledge; Gemini focused on general medical and health topics; Kimi provided more multidisciplinary content; and Ernie Bot concentrated on clinical medicine. From the empirical results, it can be found that the manual and machine scoring are highly consistent in the indicators SimHash and FKGL, which proves the effectiveness of the evaluation method proposed in this paper. Conclusion: Texts generated by domestic models are more accessible and better suited for public education, clinical communication, and health consultations. In contrast, the international model has a higher accuracy in generating expertise, especially in epidemiological studies and assessing knowledge literature on disease severity. The inclusion of manual evaluations confirms the reliability of the proposed assessment framework. It is therefore recommended that future AI-generated knowledge systems for infectious disease control balance professional rigor with public comprehensibility, in order to provide reliable and accessible reference materials during major infectious disease outbreaks.

Version published to 10.3390/covid5070099
Jun 30, 2025
Version published to 10.20944/preprints202505.1319.v1
May 16, 2025

Discovering the influence of personal features in psychological processes using Artificial Intelligence techniques: the case of COVID-19’s lockdown in Spain

This article has 5 authors:
1. Blanca Mellor-Marsá
2. Alfredo Guitian
3. Andrew Coney
4. Berta Padilla
5. Alberto Nogales
This article has no evaluationsLatest version Jun 16, 2025
From Rule-Based to DeepSeek R1 – A Robust Comparative Evaluation of Fifty Years of Natural Language Processing (NLP) Models To Identify Inflammatory Bowel Disease Cohorts

This article has 5 authors:
1. Matthew Stammers
2. Markus Gwiggner
3. Reza Nouraei
4. Cheryl Metcalf
5. James Batchelor
This article has no evaluationsLatest version Jul 7, 2025
A Methodology Framework for Analyzing Health Misinformation to Develop Inoculation Intervention Using Large Language Models: A case study on covid-19

This article has 6 authors:
1. Samira Malek
2. Christopher Griffin
3. Robert Fraleigh
4. Robert P. Lennon
5. Vishal Monga
6. Lijiang Shen
This article has no evaluationsLatest version May 23, 2025

Listed in

Abstract

Article activity feed

Related articles

Discovering the influence of personal features in psychological processes using Artificial Intelligence techniques: the case of COVID-19’s lockdown in Spain

From Rule-Based to DeepSeek R1 – A Robust Comparative Evaluation of Fifty Years of Natural Language Processing (NLP) Models To Identify Inflammatory Bowel Disease Cohorts

A Methodology Framework for Analyzing Health Misinformation to Develop Inoculation Intervention Using Large Language Models: A case study on covid-19