Enhancing Clinical Reasoning in Medical Vision-Language Model through Structured Prompts

Kavya Dasaramoole Prakash
Kiseong Kim
Youngmahn Han

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Medical Vision-Language Models (MVLMs) are emerging as powerful tools for tasks such as Visual Question Answering (VQA); however, they often struggle with hallucination and limited reasoning transparency, particularly in complex diagnostic scenarios. In this work, we enhance the MedVLM-R1 framework by fine-tuning it using clinically informed prompt structures tailored specifically for radiology-based reasoning. Without altering the original model architecture or training strategy, we redesign the system prompts and question templates to guide the model through structured, modality-aware, and step-by-step diagnostic reasoning. Fine-tuning is performed using MRI-based question-answer (QA) pairs, and evaluations are conducted across three diagnostic imaging: MRI, CT, and X-ray to assess both in-domain and out-of-domain generalization. Our approach improves reasoning transparency and accuracy, achieving 96.00% on MRI, 72.67% on CT, and 75.2% on X-ray. Compared to the original MedVLM-R1, our method closes the gap in MRI accuracy while significantly enhancing generalization performance on CT and X-ray modalities. These results demonstrate that clinically grounded prompting effectively improves both reasoning fidelity and robustness across imaging modalities. The code is available at our GitHub repository: https://github.com/aidanbio/AIdanMed

Version published to 10.1101/2025.07.31.25331901 on medRxiv
Aug 1, 2025

A Clinically-Informed Framework for Evaluating Vision-Language Models in Radiology Report Generation: Taxonomy of Errors and Risk-Aware Metric

This article has 8 authors:
1. Hao Guan
2. Peter C. Hou
3. Pengyu Hong
4. Liqin Wang
5. Wenyu Zhang
6. Xinsong Du
7. Zhengyang Zhou
8. Li Zhou
This article has no evaluationsLatest version Jul 14, 2025
ConVLM: Concept-Guided Vision-Language Models for Explainable Dermatological Diagnosis

This article has 1 author:
1. Alexander Davis
This article has no evaluationsLatest version Aug 11, 2025
The Effectiveness of Large Language Models in Providing Automated Feedback in Medical Imaging Education: A Protocol for a Systematic Review

This article has 4 authors:
1. Mustafa Al-Mashhadani
2. Faika Ajaz
3. Shaista Salman Guraya
4. Farah Ennab
This article has no evaluationsLatest version Aug 6, 2025

Listed in

Abstract

Article activity feed

Related articles

A Clinically-Informed Framework for Evaluating Vision-Language Models in Radiology Report Generation: Taxonomy of Errors and Risk-Aware Metric

ConVLM: Concept-Guided Vision-Language Models for Explainable Dermatological Diagnosis

The Effectiveness of Large Language Models in Providing Automated Feedback in Medical Imaging Education: A Protocol for a Systematic Review