Bridging the Semantic Gaps: Improving MVQA Consistency with LLM-Augmented Question Sets

Yongpei Ma
Pengyu Wang
Zhuoran Duan
Adam G Dunn
Usman Naseem
Jinman Kim

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose: We confront a critical yet under-studied weakness of Medical Visual Question Answering (MVQA): models often flip their answers when clinicians phrase the same diagnostic query differently. We ask whether large-language model-driven data augmentation can deliver paraphrase-proof MVQA. Methods: Our Semantically Equivalent Question Augmentation (SEQA) pipeline prompts a foundation LLM to spin each question into ten meaning preserving variants, enriching linguistic diversity while freezing the linked image and ground-truth answer. Three new diversity indices (ANQI, ANQA, ANQS) quantify dataset breadth, and a joint metric, TAR-SC, scores models on both accuracy and across-paraphrase agreement. Results: Augmenting SLAKE, VQA-RAD and PathVQA multiplied question–answer coverage by ×1.86, ×1.85 and ×1.46, respectively. Fine-tuning three representative backbones—M2I2, MUMC and BiomedGPT—on the enriched data raised mean answer accuracy by 19.4% and TAR-SC by 11.6% versus their original protocols, with gains persisting in zero-shot tests. Conclusion: Injecting linguistically diverse yet semantically tethered questions turns off the “paraphrase trap,” delivering MVQA systems that are markedly more stable, accurate, and thus safer for clinical deployment.

Version published to 10.21203/rs.3.rs-6867575/v1 on Research Square
Jun 24, 2025

Semantic Encoding in Medical LLMs for Vocabulary Standardisation

This article has 3 authors:
1. Samuel Mainwood
2. Aashish Bhandari
3. Sonika Tyagi
This article has no evaluationsLatest version Jun 17, 2025
Benchmarking Large Language Models on USMLE: Evaluating ChatGPT, DeepSeek, Grok, and Qwen in Clinical Reasoning and Medical Licensing Scenarios

This article has 7 authors:
1. Md Kamrul Siam
2. Angel Varela
3. Md Jobair Hossain Faruk
4. Jerry Q. Cheng
5. Huanying Gu
6. Abdullah Al Maruf
7. Zeyar Aung
This article has no evaluationsLatest version Jun 12, 2025
Fine-tuned large language models for answering questions about full-text biomedical research studies

This article has 6 authors:
1. Kaiming Tao
2. Jinru Zhou
3. Zachary A. Osman
4. Vineet Ahluwalia
5. Chiara Sabatti
6. Robert W. Shafer
This article has no evaluationsLatest version May 29, 2025

Listed in

Abstract

Article activity feed

Related articles

Semantic Encoding in Medical LLMs for Vocabulary Standardisation

Benchmarking Large Language Models on USMLE: Evaluating ChatGPT, DeepSeek, Grok, and Qwen in Clinical Reasoning and Medical Licensing Scenarios

Fine-tuned large language models for answering questions about full-text biomedical research studies