OrganSegBench: A Comprehensive Multi-Organ Benchmark for Segmentation Foundation Models with a Practical Synergy Pathway to Clinical Application
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Segmentation Foundation Models (SFMs), despite their success in general computer vision, remain suboptimal for medical imaging, where clinical requirements like fairness and robustness are paramount. Existing benchmarks fail to address these needs and often rely on datasets with potential training data leakage. To bridge this gap, we introduce “OrganSegBench”, a comprehensive benchmark evaluation framework built on a new and high quality data resource from 701 subjects with 16 annotated organs and detailed demographics. Using this resource, we systematically evaluated six state-of-the-art SFMs across five key dimensions: accuracy, generalization, robustness, fairness, and clinical utility. Our analysis uncovers a fundamental trade-off: the most accurate models are consistently the least fair, with no single model achieving excellence across all dimensions. To resolve this dilemma, we propose two ensemble strategies: training-free fusion and multi-source knowledge distillation. Notably, both approaches decisively outperformed every individual SFM across all evaluation dimensions, resolving the accuracy-fairness trade-off. These findings expose the inherent limitations of current monolithic SFMs and establish principled model synergy as a practical and superior pathway toward building safe, equitable, and clinically robust AI.