Medical Lie Detector (MLD): A Hybrid System for Validating AI Clinical Compiled Summaries

Iyad Sultan
Mais Altarawneh
Belal Lahham
Haitham Aryan
Ahmad Nasayreh
Hasan Gharaibeh
Bayan Altalla

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Accurate clinical documentation is critical for patient safety and care quality. Recent advances in artificial intelligence (AI) promise to streamline documentation, but concerns remain about the factual accuracy of auto-generated medical text. We propose a MLD system - a hybrid Retrieval-Augmented Generation (RAG) and lexical system designed to validate clinical documents by detecting inaccuracies or unsupported claims. Methods: The system combines natural language processing with dual-index retrieval (lexical BM25 and semantic vector search) to cross-check documentation content. It processes medical documents, retrieves relevant evidence from patient records and medical knowledge bases, and automatically generates pointed questions about the content. A validation pipeline flags potential inconsistencies, which can then be reviewed by human experts. We evaluated the system on a dataset of synthetic clinical notes representing 10 patients admitted for different reasons. AI generated discharge summaries (prepared by gemini 2.0) with or without implanted factual errors were evaluated, measuring sensitivity, specificity, F1-score, and accuracy against facts identified in the original notes. Results: The MLD identified documentation inaccuracies with high sensitivity (94%) and specificity (91%), corresponding to an F1-score of 0.92 and overall accuracy of 93%. It effectively caught factual injections. After human validation, few inconsistencies were resolved and the model performance increased to near perfection, indicating over estimation of hallucinations. Conclusions: Our results demonstrate that our system can substantially enhance the accuracy of medical documentation by flagging potential errors for review. This hybrid approach leverages AI speed and consistency with human judgment as a safety net, aligning with emerging standards for reliable AI in healthcare.

Version published to 10.21203/rs.3.rs-6753627/v1 on Research Square
Jun 3, 2025

RAGnosis: Retrieval-Augmented Generation for Enhanced Medical Decision Making

This article has 5 authors:
1. Amir Rouhollahi
2. Ali Homaei
3. Aanchal Sahu
4. Rayan Ebnali Harari
5. Farhad R. Nezami
This article has no evaluationsLatest version Jun 12, 2025
Implementation of Large Language Models in Electronic Health Records

This article has 3 authors:
1. Maxime Griot
2. Jean Vanderdonckt
3. Demet Yuksel
This article has no evaluationsLatest version Jul 4, 2025
MRQC-LLM: A Novel Large Language Model Framework for Enhancing Medical Record Quality Control

This article has 8 authors:
1. Zhenqi Zhang
2. Xuchen Yang
3. Xun Yao
4. Hao Yang
5. Shutong Zhang
6. Sikai Liu
7. Jing Wang
8. Rui Shi
This article has no evaluationsLatest version Jun 25, 2025

Listed in

Abstract

Article activity feed

Related articles

RAGnosis: Retrieval-Augmented Generation for Enhanced Medical Decision Making

Implementation of Large Language Models in Electronic Health Records

MRQC-LLM: A Novel Large Language Model Framework for Enhancing Medical Record Quality Control