Transcribing multilingual radiologist-patient dialogue into mammography reports using AI: A step towards patient-centric radiology
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Radiology reports are primarily designed for healthcare professionals, often containing complex medical terminology hindering patients from understanding their diagnostic results. This communication gap is especially pronounced in non-English-speaking regions. AI-driven transcription and report generation, leveraging automated speech recognition (ASR) and large language models (LLMs), could enable patient-centered, accessible reporting from radiologist-patient conversations in vernacular language.
Purpose
To evaluate the feasibility of AI-driven transcription and automated mammography report generation from simulated radiologist-patient conversations in vernacular language, assessing transcription accuracy, report concordance, error patterns, and time efficiency.
Materials and Methods
A curated dataset of 50 mammograms was retrospectively selected from the Picture Archiving and Communication System (PACS) of our department. Simulated radiologist-patient conversations, conducted in vernacular Hindi, were recorded and transcribed using the OpenAI Whisper large-v2 ASR model. Four transcriptions per conversation were generated at different temperatures (0, 0.3, 0.5, 0.7) to maximize information capture. Structured mammography reports were generated from the transcriptions using GPT-4o, guided by detailed prompt instructions. Reports were reviewed and corrected by a radiologist, and AI performance was assessed through word error rate (WER), character error rate (CER), report concordance rates, error analysis, and time efficiency metrics.
Results
The lowest WER (0.577) and CER (0.379) were observed at temperature 0. The overall mean concordance rate between AI-generated and radiologist-edited reports was 0.94, with structured fields achieving higher concordance than descriptive fields. Errors were present in 50% of AI-generated reports, predominantly missed and incorrect information, with a higher error rate in malignant cases. The mean time for AI-driven report generation was 207.4 seconds, with radiologist editing contributing 43.1 seconds on average.
Conclusion
AI-driven workflow integrating ASR and LLMs to generate structured mammography reports from radiologist-patient conversations in vernacular language, is feasible. While challenges such as privacy, validation, and scalability remain, this approach represents a significant step toward patient-centric and AI-integrated radiology practice.