Medical Lie Detector (MLD): A Hybrid System for Validating AI Clinical Compiled Summaries

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Accurate clinical documentation is critical for patient safety and care quality. Recent advances in artificial intelligence (AI) promise to streamline documentation, but concerns remain about the factual accuracy of auto-generated medical text. We propose a MLD system - a hybrid Retrieval-Augmented Generation (RAG) and lexical system designed to validate clinical documents by detecting inaccuracies or unsupported claims. Methods: The system combines natural language processing with dual-index retrieval (lexical BM25 and semantic vector search) to cross-check documentation content. It processes medical documents, retrieves relevant evidence from patient records and medical knowledge bases, and automatically generates pointed questions about the content. A validation pipeline flags potential inconsistencies, which can then be reviewed by human experts. We evaluated the system on a dataset of synthetic clinical notes representing 10 patients admitted for different reasons. AI generated discharge summaries (prepared by gemini 2.0) with or without implanted factual errors were evaluated, measuring sensitivity, specificity, F1-score, and accuracy against facts identified in the original notes. Results: The MLD identified documentation inaccuracies with high sensitivity (94%) and specificity (91%), corresponding to an F1-score of 0.92 and overall accuracy of 93%. It effectively caught factual injections. After human validation, few inconsistencies were resolved and the model performance increased to near perfection, indicating over estimation of hallucinations. Conclusions: Our results demonstrate that our system can substantially enhance the accuracy of medical documentation by flagging potential errors for review. This hybrid approach leverages AI speed and consistency with human judgment as a safety net, aligning with emerging standards for reliable AI in healthcare.

Article activity feed