Segmenting with Confidence: Uncertainty Quantification for Brain Tumor Imaging
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Purpose To develop and validate a deep learning framework that provides clinically meaningful uncertainty estimates for meningioma segmentation, enabling more trustworthy longitudinal volumetric assessment. Materials and Methods In this retrospective study, we developed an evidential deep learning (EDL) ensemble framework and trained it on 1,655 post-contrast T1-weighted brain MRIs from 788 patients with meningiomas. We evaluated the clinical utility of an architecturally heterogeneous ensemble on an independent test set of 68 MRIs from 43 patients. We compared its performance to other state-of-the-art segmentation models and uncertainty estimates. The evaluation included: (1) assessment of Dice score and overall segmentation accuracy, (2) qualitative correspondence of spatial uncertainty maps with neuroradiologist-defined ambiguity, (3) quantitative calibration of volumetric credible intervals using empirical coverage, assessing whether the model’s intervals indeed contained the true volume in 95% of cases and (4) external validation on an independent cohort from another institution to confirm generalizability. Results High segmentation accuracy was achieved across all ensemble configurations (median Dice ≈ 0.93), with spatial uncertainty maps qualitatively aligning with regions rated as difficult by a neuroradiologist. Out of all models tested, the heterogeneous EDL ensemble produced the most reliable volumetric credible intervals, capturing the true tumor volume in 92.8% of cases. External validation on an independent external cohort of 353 patients with meningioma confirmed high generalizability, achieving a median Dice of0.92. Conclusion Evidential deep learning ensembles provide well-calibrated uncertainty estimates while maintaining high segmentation accuracy. Architectural diversity enhances credible interval calibration, enabling more trustworthy single time-point and longitudinal assessments and supporting safer clinical deployment of automated meningioma segmentation. The methods presented here for meningioma are directly applicable to medical image lesion segmentation more broadly, promising to increase trust and safety in the use of AI in medical imaging.