Assessment of the efficacy of ChatGPT responses to bacterial species-specific questions in microbiology.

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: ChatGPT, an OpenAI chatbot, serves as a valuable tool in the present for learning and Education. It also offers information on microbiology as popularity grows among students. However, assessing the accuracy of ChatGPT's responses is essential due to the potential for "hallucinations" in large-language-models (LLMs). Objectives: This study focused on evaluating the accuracy of ChatGPT’s responses in general questions on bacterial species and to assess whether the responses included key microbiological terms typically expected in academic or examination settings. Methodology: Questions were designed to reflect interactions at three language proficiency levels including low, moderate and high. A clinical microbiologist finalized a list of 15 bacterial species, each with 18 specific questions of both local and international relevance. These questions were then prompted to ChatGPT 3.5 and 4.1 mini models, simulating real user interactions. Responses were evaluated using a microbiology reference guide and categorized as accurate, mixed/incomplete, or inaccurate. Results: Results revealed average scores of 1.5%, 58.1%, 40.4% and 0.5%, 43.2%, 56.3% for inaccurate, mixed/incomplete and accurate answers for 3.5 and 4.1 mini models respectively. While high proficiency demonstrated a higher percentage of accurate responses, all other results were either mixed/incomplete, or inaccurate. Conclusion:  Findings suggest that precise questions yielded more accurate responses, while imprecise questions often led to partially correct responses. Notably, ChatGPT 4.1 mini gave clearer and more reliable answers than ChatGPT 3.5. The study emphasizes the influence of question formulation on response accuracy, recommending further research to explore more advanced LLMs like ChatGPT-4o and ChatGPT-o3 models.

Article activity feed

  1. The figures within this manuscript where commented upon in the reviewers. The redrafted manuscript has not fully addressed these comments and thus the figures do not meet the standards for publication. Please complete the following: 1) Please merge Figure 3,4,5,6,7 and 8 into and single table. Currently the data is contained within screenshots and is difficult to access. Please can the text be extracted from these screenshots and compiled into a single table 2) Please put figure 9 and figure 10 together in a single figure, with the two graphs as two panels 3) Please can figure 12 and 13 be merged into a single table / figure. The pie charts are unclear, within the figure legend each digital is in two categories. Please can this data be displayed in a more suitable manner

  2. Comments to Author

    Overall, I think this is an interesting paper. It highlights the caveats and uses of AI tools, and I think that this tier of journal is a good place for it. I can imagine it being cited by other studies. Personally, I would encourage the authors to also compare the results to how test participants perform at different levels, or how the models compare to a simple programme/decision tree compared to ChatGPT models, so the model could be compared to existing technologies. However. I think as a major improvement before this paper is published, a great deal of the figures should be concatenated or placed into supplementary materials. An article with double digit figures should not have this little data and there is much room for figure improvement. Many of the pie charts, and especially tables should be placed into one table, rather than in line. I can imagine the same paper with two or three figures and one table.

    Please rate the manuscript for methodological rigour

    Good

    Please rate the quality of the presentation and structure of the manuscript

    Very good

    To what extent are the conclusions supported by the data?

    Strongly support

    Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?

    No

    Is there a potential financial or other conflict of interest between yourself and the author(s)?

    No

    If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?

    Yes