WELDE: A Weighted Ensemble Loss with Diversity Enhancement for Imbalanced Object Detection in Medical Imaging
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Class imbalance in medical imaging datasets remains a key challenge for reliable object detection, particularly when rare yet clinically significant pathologies coexist with prevalent findings. In spinal MRI, common conditions such as Normal Intervertebral Disc (IVD) may constitute over 45% of annotated objects, whereas findings like Spondylolisthesis account for fewer than 2% of instances. Conventional loss functions including Focal Loss, Class-Balanced Loss, and Label-Distribution-Aware Margin Loss, each address isolated facets of this imbalance but do not provide a unified, adaptive solution. Inspired by ensemble loss strategies recently advanced in Deep Metric Learning (DML), we propose WELDE ( W eighted E nsemble L oss with D iversity E nhancement), a framework that combines four complementary loss functions via per-head adapter projections, EMA-based normalization, and learnable adaptive weighting with a relaxed sum-to-one penalty. Each loss component receives a dedicated classification head with an independent adapter projection from a shared frozen backbone, enabling feature specialization without backbone fine-tuning. We provide theoretical analysis of WELDE's properties, including gradient magnitude balancing across loss components and weight non-degeneracy. Applied to a lumbar mid-sagittal spinal MRI dataset with six classes and a 33.9:1 imbalance ratio, WELDE achieves the highest classification performance among all evaluated methods, outperforming all single-loss baselines (mAP 0.702 vs.\0.689 for the best baseline CE, mAP\((_{\text{tail}})\) 0.509 vs.\0.472, \((+)\)8.1% relative improvement on tail classes) and an architecture-matched CE ensemble control (mAP\((_{\text{tail}})\) 0.509 vs.\0.496), confirming that the improvement derives from diverse loss composition rather than increased model capacity. External cross-domain validation on the DermaMNIST skin lesion benchmark (7 classes, \((\rho{=}58.3)\)) confirms that \welde{} generalizes robustly, achieving the highest mAP (\((0.709)\)) and mAP\((_{\text{tail}})\) (\((0.651)\)) among all methods, outperforming both single-head baselines (\((+11.5%)\) mAP over CE) and the architecture-matched CE ensemble control.