Rethinking Medical LLM Hallucinations: A System-Level Survey
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large language models (LLMs) demonstrate strong performance across biomedical and clinical tasks, yet their deployment in healthcare remains limited by hallucination. Prior research has often treated hallucination as an isolated model failure to be addressed through improved training, prompting, or retrieval. However, emerging theoretical and empirical evidence suggests that hallucination is instead a structural property of probabilistic language generation rather than a fully removable bug. This distinction is particularly critical in medicine and healthcare, where near-correct answers, fabricated evidence, and unsafe recommendations can introduce real patient risk and legal liability. In this paper, we present a system-level survey of hallucination in medical LLMs. Rather than exhaustively cataloging every prior work, our goal is to highlight the dominant research directions and analyze the problem from a systems perspective. We synthesize literature spanning definitions, taxonomies, benchmarks, detection methods, and mitigation strategies, and examine how these components interact within real clinical workflows. Our analysis shows that despite diverse models and technical advances, improvements to individual components rarely translate into reliable end-to-end systems. Based on this synthesis, we argue that hallucination in healthcare should be treated as a system-level risk management problem rather than a model-level defect. We outline key open challenges and emphasize the need to understand not only how to reduce hallucinations, but why they occur and how their impact propagates through clinical decision pipelines. Ultimately, progress toward trustworthy medical AI will depend on designing systems that anticipate, monitor, and safely manage hallucinations rather than assuming they can be fully eliminated.