MammoVQA: A Benchmark for Breast Cancer Screening and Diagnosis in Mammogram Visual Question Answering

Hao Chen
Jiayi Zhu
Fuxiang Huang
Qiong Luo

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Breast cancer remains the most prevalent malignancy in women worldwide. Mammography-based early detection plays a pivotal role in improving patient survival outcomes. While large vision-language models (LVLMs) offer transformative potential for mammogram visual question answering (VQA), the absence of standardized evaluation benchmarks currently limits their reliable clinical deployment. In this study, we address this critical gap through three key contributions: (1) We introduce MammoVQA, the first mammogram VQA dataset, unifying 11 public datasets into 104,914 images (337K QA pairs) for image-level cases and 72,518 exams (476K images, 144K QA pairs) for exam-level analysis. (2) Systematic evaluation of 9 LVLMs (4 general, 5 medical) reveals diagnostic performance statistically equivalent to random guessing, highlighting their unreliability for clinical breast cancer screening. (3) Our domain-optimized LLaVA-Mammo achieves average +21.00% weighted accuracy gains over SOTA in internal validation, with average +22.99% weighted accuracy improvements in external validation across 4 datasets.

Version published to 10.21203/rs.3.rs-6543886/v1 on Research Square
May 2, 2025

Evaluating the performance of large language & visual-language models in cervical cytology screening

This article has 16 authors:
1. Qi Hong
2. Shijie Liu
3. Liying Wu
4. Qiqi Lu
5. Pinglan Yang
6. Dingyu Chen
7. Gong Rao
8. Xinyi Liu
9. Hua Ye
10. Peiqi Zhuang
11. Wenxiu Yang
12. Shaoqun Zeng
13. Qianjin Feng
14. Xiuli Liu
15. Jing Cai
16. Shenghua Cheng
This article has no evaluationsLatest version May 23, 2025
Comparative Evaluation the Knowledge of Large Language Models about Response Evaluation Criteria in Solid Tumors?

This article has 3 authors:
1. Eren Çamur
2. Turay Cesur
3. Yasin Celal Güneş
This article has no evaluationsLatest version May 7, 2025
From Variability to Standardization: The Impact of Breast Density on Background Parenchymal Enhancement in Contrast-Enhanced Mammography and the Need for a Structured Reporting System

This article has 7 authors:
1. G Di Grezia
2. A Nazzaro
3. E Cisternino
4. A Galiano
5. G. Gatta
6. V Cuccurullo
7. M Scaglione
This article has no evaluationsLatest version Apr 17, 2025

Listed in

Abstract

Article activity feed

Related articles

Evaluating the performance of large language & visual-language models in cervical cytology screening

Comparative Evaluation the Knowledge of Large Language Models about Response Evaluation Criteria in Solid Tumors?

From Variability to Standardization: The Impact of Breast Density on Background Parenchymal Enhancement in Contrast-Enhanced Mammography and the Need for a Structured Reporting System