Bridging the Semantic Gaps: Improving MVQA Consistency with LLM-Augmented Question Sets
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Purpose: We confront a critical yet under-studied weakness of Medical Visual Question Answering (MVQA): models often flip their answers when clinicians phrase the same diagnostic query differently. We ask whether large-language model-driven data augmentation can deliver paraphrase-proof MVQA. Methods: Our Semantically Equivalent Question Augmentation (SEQA) pipeline prompts a foundation LLM to spin each question into ten meaning preserving variants, enriching linguistic diversity while freezing the linked image and ground-truth answer. Three new diversity indices (ANQI, ANQA, ANQS) quantify dataset breadth, and a joint metric, TAR-SC, scores models on both accuracy and across-paraphrase agreement. Results: Augmenting SLAKE, VQA-RAD and PathVQA multiplied question–answer coverage by ×1.86, ×1.85 and ×1.46, respectively. Fine-tuning three representative backbones—M2I2, MUMC and BiomedGPT—on the enriched data raised mean answer accuracy by 19.4% and TAR-SC by 11.6% versus their original protocols, with gains persisting in zero-shot tests. Conclusion: Injecting linguistically diverse yet semantically tethered questions turns off the “paraphrase trap,” delivering MVQA systems that are markedly more stable, accurate, and thus safer for clinical deployment.