Assessment of the efficacy of ChatGPT responses to bacterial species-specific questions in microbiology.

Withanage Dona Manushi Dinasha Withanage
Nissanka Mudiyanselage Tanuri Ayanga Nissanka
Chamudhi Prabashi Wickramasinghe
Warnakulasuriya Palakuttige Pasindu Damsara Fernando
Vindya Perera
Hiripitiyage Gayan Danushka Gunatilake

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: ChatGPT, an OpenAI chatbot, serves as a valuable tool in the present for learning and Education. It also offers information on microbiology as popularity grows among students. However, assessing the accuracy of ChatGPT's responses is essential due to the potential for "hallucinations" in large-language-models (LLMs). Objectives: This study focused on evaluating the accuracy of ChatGPT’s responses in general questions on bacterial species and to assess whether the responses included key microbiological terms typically expected in academic or examination settings. Methodology: Questions were designed to reflect interactions at three language proficiency levels including low, moderate and high. A clinical microbiologist finalized a list of 15 bacterial species, each with 18 specific questions of both local and international relevance. These questions were then prompted to ChatGPT 3.5 and 4.1 mini models, simulating real user interactions. Responses were evaluated using a microbiology reference guide and categorized as accurate, mixed/incomplete, or inaccurate. Results: Results revealed average scores of 1.5%, 58.1%, 40.4% and 0.5%, 43.2%, 56.3% for inaccurate, mixed/incomplete and accurate answers for 3.5 and 4.1 mini models respectively. While high proficiency demonstrated a higher percentage of accurate responses, all other results were either mixed/incomplete, or inaccurate. Conclusion: Findings suggest that precise questions yielded more accurate responses, while imprecise questions often led to partially correct responses. Notably, ChatGPT 4.1 mini gave clearer and more reliable answers than ChatGPT 3.5. The study emphasizes the influence of question formulation on response accuracy, recommending further research to explore more advanced LLMs like ChatGPT-4o and ChatGPT-o3 models.

Version published to 10.1099/acmi.0.001137.v3 on Access Microbiology
Feb 25, 2026
Access Microbiology
Feb 6, 2026

The figures within this manuscript where commented upon in the reviewers. The redrafted manuscript has not fully addressed these comments and thus the figures do not meet the standards for publication. Please complete the following: 1) Please merge Figure 3,4,5,6,7 and 8 into and single table. Currently the data is contained within screenshots and is difficult to access. Please can the text be extracted from these screenshots and compiled into a single table 2) Please put figure 9 and figure 10 together in a single figure, with the two graphs as two panels 3) Please can figure 12 and 13 be merged into a single table / figure. The pie charts are unclear, within the figure legend each digital is in two categories. Please can this data be displayed in a more suitable manner

Read the original source
Version published to 10.1099/acmi.0.001137.v2 on Access Microbiology
Dec 5, 2025
Version published to 10.1099/acmi.0.001137.v1 on Access Microbiology
Nov 6, 2025
Access Microbiology
Nov 5, 2025

Thank you for submitting your paper to Access Microbiology. It has now been reviewed and I would like you to revise the paper in line with the reviewers' reports.

Read the original source
Access Microbiology
Oct 20, 2025

Comments to Author

Overall, I think this is an interesting paper. It highlights the caveats and uses of AI tools, and I think that this tier of journal is a good place for it. I can imagine it being cited by other studies. Personally, I would encourage the authors to also compare the results to how test participants perform at different levels, or how the models compare to a simple programme/decision tree compared to ChatGPT models, so the model could be compared to existing technologies. However. I think as a major improvement before this paper is published, a great deal of the figures should be concatenated or placed into supplementary materials. An article with double digit figures should not have this little data and there is much room for figure improvement. Many of the pie charts, and especially tables should be …

Comments to Author

Overall, I think this is an interesting paper. It highlights the caveats and uses of AI tools, and I think that this tier of journal is a good place for it. I can imagine it being cited by other studies. Personally, I would encourage the authors to also compare the results to how test participants perform at different levels, or how the models compare to a simple programme/decision tree compared to ChatGPT models, so the model could be compared to existing technologies. However. I think as a major improvement before this paper is published, a great deal of the figures should be concatenated or placed into supplementary materials. An article with double digit figures should not have this little data and there is much room for figure improvement. Many of the pie charts, and especially tables should be placed into one table, rather than in line. I can imagine the same paper with two or three figures and one table.

Please rate the manuscript for methodological rigour

Good

Please rate the quality of the presentation and structure of the manuscript

Very good

To what extent are the conclusions supported by the data?

Strongly support

Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?

No

Is there a potential financial or other conflict of interest between yourself and the author(s)?

No

If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?

Yes

Read the original source

Performance of Chatgpt in Simulated Anesthesia Scenarios: A Prospective Comparison with Expert Clinicians

This article has 3 authors:
1. Agah Abdullah Kahramanlar
2. Ramazan Ince
3. Habip Burak Ozgodek
This article has no evaluationsLatest version Mar 20, 2026
Large Language Models as Ophthalmic Patient Educators: A Comparative Evaluation of Readability, Understandability, and Actionability

This article has 3 authors:
1. Shivam Chandra
2. Vineet Kumar
3. Patrianakos Thomas
This article has no evaluationsLatest version Mar 20, 2026
ChatGPT Making our Minds Dull? The Cognitive Impact of Using ChatGPT in the Writing Process

This article has 1 author:
1. Eureka Luna Wagner-Kobayashi
This article has no evaluationsLatest version Feb 18, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Performance of Chatgpt in Simulated Anesthesia Scenarios: A Prospective Comparison with Expert Clinicians

Large Language Models as Ophthalmic Patient Educators: A Comparative Evaluation of Readability, Understandability, and Actionability

ChatGPT Making our Minds Dull? The Cognitive Impact of Using ChatGPT in the Writing Process