Assessment of the efficacy of ChatGPT responses to bacterial species-specific questions in microbiology.

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: ChatGPT, an OpenAI chatbot, serves as a valuable tool in the present for learning and Education. It also offers information on microbiology as popularity grows among students. However, assessing the accuracy of ChatGPT's responses is essential due to the potential for "hallucinations" in large-language-models (LLMs). Objectives: This study focused on evaluating the accuracy of ChatGPT’s responses in general questions on bacterial species and to assess whether the responses included key microbiological terms typically expected in academic or examination settings. Methodology: Questions were designed to reflect interactions at three language proficiency levels including low, moderate and high. A clinical microbiologist finalized a list of 15 bacterial species, each with 18 specific questions of both local and international relevance. These questions were then prompted to ChatGPT 3.5 and 4.1 mini models, simulating real user interactions. Responses were evaluated using a microbiology reference guide and categorized as accurate, mixed/incomplete, or inaccurate. Results: Results revealed average scores of 1.5%, 58.1%, 40.4% and 0.5%, 43.2%, 56.3% for inaccurate, mixed/incomplete and accurate answers for 3.5 and 4.1 mini models respectively. While high proficiency demonstrated a higher percentage of accurate responses, all other results were either mixed/incomplete, or inaccurate. Conclusion:  Findings suggest that precise questions yielded more accurate responses, while imprecise questions often led to partially correct responses. Notably, ChatGPT 4.1 mini gave clearer and more reliable answers than ChatGPT 3.5. The study emphasizes the influence of question formulation on response accuracy, recommending further research to explore more advanced LLMs like ChatGPT-4o and ChatGPT-o3 models.

Article activity feed