MammoVQA: A Benchmark for Breast Cancer Screening and Diagnosis in Mammogram Visual Question Answering

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Breast cancer remains the most prevalent malignancy in women worldwide. Mammography-based early detection plays a pivotal role in improving patient survival outcomes. While large vision-language models (LVLMs) offer transformative potential for mammogram visual question answering (VQA), the absence of standardized evaluation benchmarks currently limits their reliable clinical deployment. In this study, we address this critical gap through three key contributions: (1) We introduce MammoVQA, the first mammogram VQA dataset, unifying 11 public datasets into 104,914 images (337K QA pairs) for image-level cases and 72,518 exams (476K images, 144K QA pairs) for exam-level analysis. (2) Systematic evaluation of 9 LVLMs (4 general, 5 medical) reveals diagnostic performance statistically equivalent to random guessing, highlighting their unreliability for clinical breast cancer screening. (3) Our domain-optimized LLaVA-Mammo achieves average +21.00% weighted accuracy gains over SOTA in internal validation, with average +22.99% weighted accuracy improvements in external validation across 4 datasets.

Article activity feed