Bridging the Semantic Gaps: Improving MVQA Consistency with LLM-Augmented Question Sets

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Purpose: We confront a critical yet under-studied weakness of Medical Visual Question Answering (MVQA): models often flip their answers when clinicians phrase the same diagnostic query differently. We ask whether large-language model-driven data augmentation can deliver paraphrase-proof MVQA. Methods: Our Semantically Equivalent Question Augmentation (SEQA) pipeline prompts a foundation LLM to spin each question into ten meaning preserving variants, enriching linguistic diversity while freezing the linked image and ground-truth answer. Three new diversity indices (ANQI, ANQA, ANQS) quantify dataset breadth, and a joint metric, TAR-SC, scores models on both accuracy and across-paraphrase agreement. Results: Augmenting SLAKE, VQA-RAD and PathVQA multiplied question–answer coverage by ×1.86, ×1.85 and ×1.46, respectively. Fine-tuning three representative backbones—M2I2, MUMC and BiomedGPT—on the enriched data raised mean answer accuracy by 19.4% and TAR-SC by 11.6% versus their original protocols, with gains persisting in zero-shot tests. Conclusion: Injecting linguistically diverse yet semantically tethered questions turns off the “paraphrase trap,” delivering MVQA systems that are markedly more stable, accurate, and thus safer for clinical deployment.

Article activity feed