Comparative analysis of privacy-preserving open-source LLMs regarding extraction of diagnosticinformation from clinical CMR imaging reports
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We investigated the utilization of privacy-preserving, locally-deployed, open-source Large Language Models (LLMs) to extractdiagnostic information from free-text cardiovascular magnetic resonance (CMR) reports. We evaluated nine open-sourceLLMs on their ability to identify diagnoses and classify patients into various cardiac diagnostic categories based on descriptivefindings in 109 clinical CMR reports. Performance was quantified using standard classification metrics including accuracy,precision, recall, and F1 score. We also employed confusion matrices to examine patterns of misclassification across models.Most open-source LLMs demonstrated exceptional performance in classifying reports into different diagnostic categories.Google’s Gemma 2 model achieved the highest average F1 score of 0.98, followed by Qwen2.5:32B and DeepseekR1-32Bwith F1 scores of 0.96 and 0.95, respectively. All other evaluated models attained average scores above 0.93, with Mistraland DeepseekR1-7B being the only exceptions. The top four LLMs outperformed our board-certified cardiologist (F1 scoreof 0.94) across all evaluation metrics in analyzing CMR reports. Our findings demonstrate the feasibility of implementingopen-source, privacy-preserving LLMs in clinical settings for automated analysis of imaging reports, enabling accurate, fastand resource-efficient diagnostic categorization.