Large Language Models in Infectious Diseases: A Systemic Review

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Clinical reasoning in infectious diseases relies on validated evidence. LLMs are being introduced into diagnosis, antimicrobial stewardship, and guideline interpretation before their safety and reliability are established. Methods This review, registered in PROSPERO (CRD420251155354), evaluated studies using GPT, Claude, Gemini, and retrieval-augmented or agentic systems for infectious disease decision-making. PubMed, CENTRAL, Scopus, and Web of Science were searched from January 2018 to September 2025. Two reviewers screened and extracted data. Risk of bias was assessed with QUADAS-AI. Findings: Thirty-one studies met inclusion criteria. Most were cross-sectional (61%) and vignette-based (68%). Only 32% used real clinical data; 23% had low risk of bias. Safety issues were reported in 90% of studies: incomplete responses (61%), unsafe advice (23–32%), and fabricated content (32%). In antimicrobial stewardship, agreement with infectious-disease specialists was ~ 50%. Diagnostic sensitivity for structured infections was 80–100%. Retrieval-augmented systems increased specificity from 35% to 75% and reduced hallucinations. Proprietary models outperformed open-source models but did not reach expert accuracy. Interpretation: LLMs perform well in defined diagnostic tasks but remain unreliable for autonomous clinical use. High error rates, inconsistent reasoning, and fabricated content require expert oversight and external validation before deployment.

Article activity feed