Independent Benchmarking of Prompt-Based Medical Segmentation Models

Ayhan Can Erdur
Daniel Scholz
Josef A. Buchner
Denise Bernhardt
Stephanie E. Combs
Benedikt Wiestler
Daniel Rueckert
Jan C. Peeken

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Medical image segmentation rapidly shifts toward vision(-language) foundation models that unify diverse modalities and tasks within a single framework. In this work, we systematically benchmark high-impact vision-language and segment-anything-based architectures across multiple clinically relevant CT and MRI tasks. We show that while these models achieve strong performance, each comes with specific (dis)advantages. Non-3D models are highly flexible but require substantial user guidance and are prone to over- or under-detection. 3D architectures offer overall more reliable volumetric consistency, but can still have detection problems. Vision-language models appear sensitive to the coverage of training data, whereas click-prompted SAM-based models are more universal, with a, though limited, ability to address zero-shot targets. When tested with more complex text prompts, most vision-language models exhibit missing semantic language understanding. Overall, these models hold considerable promise but still express limitations. Our work highlights key areas where future research is needed to advance vision(-language) foundation models.

Version published to 10.21203/rs.3.rs-7593773/v1 on Research Square
Oct 10, 2025

OrganSegBench: A Comprehensive Multi-Organ Benchmark for Segmentation Foundation Models with a Practical Synergy Pathway to Clinical Application

This article has 13 authors:
1. Chengyan Wang
2. Qing Li
3. Yizhe Zhang
4. Xin Guo
5. Haosen Zhang
6. Yan Li
7. Mo Yang
8. Yajing Zhang
9. Mengting Sun
10. Longyu Sun
11. Haoyang Zhang
12. Junhong Liu
13. Shuo Wang
This article has no evaluationsLatest version Oct 21, 2025
An Empirical Evaluation of Low-Rank Adapted Vision–Language Models for Radiology Medical Image Captioning

This article has 6 authors:
1. Mahmudul Hoque
2. Raisa Nusrat Chowdhury
3. Md Rakibul Hasan
4. Ojonugwa Oluwafemi Ejiga Peter
5. Fahmi Khalifa
6. Md Mahmudur Rahman
This article has no evaluationsLatest version Oct 24, 2025
A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model

This article has 16 authors:
1. Hao Chen
2. Zhe Xu
3. Ziyi LIU
4. Junlin HOU
5. Ma Jiabo
6. Cheng Jin
7. Yihui Wang
8. Zhixuan CHEN
9. Zhengyu ZHANG
10. Fuxiang HUANG
11. Zhengrui GUO
12. Fengtao ZHOU
13. Yingxue XU
14. Xi WANG
15. Ronald Chan
16. Li Liang
This article has no evaluationsLatest version Oct 1, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

OrganSegBench: A Comprehensive Multi-Organ Benchmark for Segmentation Foundation Models with a Practical Synergy Pathway to Clinical Application

An Empirical Evaluation of Low-Rank Adapted Vision–Language Models for Radiology Medical Image Captioning

A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model