Comparative analysis of privacy-preserving open-source LLMs regarding extraction of diagnosticinformation from clinical CMR imaging reports

Sina Amirrajab
Volker Vehof
Michael Bietenbeck
Ali Yilmaz

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We investigated the utilization of privacy-preserving, locally-deployed, open-source Large Language Models (LLMs) to extractdiagnostic information from free-text cardiovascular magnetic resonance (CMR) reports. We evaluated nine open-sourceLLMs on their ability to identify diagnoses and classify patients into various cardiac diagnostic categories based on descriptivefindings in 109 clinical CMR reports. Performance was quantified using standard classification metrics including accuracy,precision, recall, and F1 score. We also employed confusion matrices to examine patterns of misclassification across models.Most open-source LLMs demonstrated exceptional performance in classifying reports into different diagnostic categories.Google’s Gemma 2 model achieved the highest average F1 score of 0.98, followed by Qwen2.5:32B and DeepseekR1-32Bwith F1 scores of 0.96 and 0.95, respectively. All other evaluated models attained average scores above 0.93, with Mistraland DeepseekR1-7B being the only exceptions. The top four LLMs outperformed our board-certified cardiologist (F1 scoreof 0.94) across all evaluation metrics in analyzing CMR reports. Our findings demonstrate the feasibility of implementingopen-source, privacy-preserving LLMs in clinical settings for automated analysis of imaging reports, enabling accurate, fastand resource-efficient diagnostic categorization.

Version published to 10.21203/rs.3.rs-6776028/v1 on Research Square
Oct 26, 2025

Decoding Ambrisentan's Risks: A Deep Dive into FAERS Data

This article has 7 authors:
1. Liao Jianting
2. Sitian Li
3. Qinan Yin
4. Changli He
5. Gang Li
6. Lizhu Han
7. Yuan Bian
This article has no evaluationsLatest version Oct 23, 2025
MAX-EVAL-11: A Comprehensive Benchmark for Evaluating Large Language Models on Full-Spectrum ICD-11 Medical Coding

This article has 4 authors:
1. Ujjwal Singh
2. Sarthak Deshwal
3. Nitish Dube
4. Arjun Sharma
This article has no evaluationsLatest version Oct 31, 2025
From text to tables: Zero-shot extraction of structured clinical data from free-text CT scan reports using foundational large language models

This article has 10 authors:
1. Alex Hongslo
2. Amulya Gupta
3. Quynh Nguyen
4. Jake Caldwell
5. Ben Choi
6. Christopher J. Harvey
7. Jeffrey Thompson
8. Diego Mazzotti
9. Zijun Yao
10. Amit Noheria
This article has no evaluationsLatest version Oct 7, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Decoding Ambrisentan's Risks: A Deep Dive into FAERS Data

MAX-EVAL-11: A Comprehensive Benchmark for Evaluating Large Language Models on Full-Spectrum ICD-11 Medical Coding

From text to tables: Zero-shot extraction of structured clinical data from free-text CT scan reports using foundational large language models