Precise prostate contours: setting the bar and meticulously evaluating AI performance

Yuze Song
Anna Dornisch
Robert T Dess
Daniel JA Margolis
Eric P Weinberg
Tristan Barrett
Mariel Cornell
Richard E Fan
Mukesh Harisinghani
Sophia C. Kamran
Jeong Hoon Lee
Cynthia Xinran Li
Michael A Liss
Mirabela Rusu
Jason Santos
Geoffrey A Sonn
Igor Vidic
Sean A Woolen
Anders M Dale
Tyler M Seibert

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Introduction

Evaluation of artificial intelligence (AI) algorithms for prostate segmentation is challenging because ground truth is lacking. We aimed to (1) create a reference standard dataset with precise prostate contours by expert consensus and (2) evaluate various AI tools against this standard.

Materials and methods

We obtained prostate MRI cases from six institutions from the Quantitative Prostate Imaging Consortium. A panel of four experts (two genitourinary radiologists, two prostate radiation oncologists) meticulously developed consensus prostate segmentations on axial T ₂ -weighted series. We evaluated the performance of six AI tools (three commercially available, three academic) using Dice scores, distance from reference contour, and volume error.

Results

The panel achieved consensus prostate segmentation on each slice of all 68 patient cases included in the reference dataset. We present two patient examples to serve as contouring guides. Depending on the AI tool, median Dice scores (across patients) ranged from 0.80 to 0.94 for whole prostate segmentation.

For a typical (median) patient, AI tools had a mean error over the prostate surface ranging from 1.3 to 2.4 mm. They maximally deviated 3.0 to 9.4 mm outside the prostate and 3.0 to 8.5 mm inside the prostate for a typical patient. Error in prostate volume measurement for a typical patient ranged from 4.3% to 31.4%.

Discussion

We established an expert consensus benchmark for prostate segmentation. The best-performing AI tools have typical accuracy greater than that reported for radiation oncologists using CT scans (most common clinical approach for radiotherapy planning). Physician review remains essential to detect occasional major errors.

Version published to 10.1101/2024.10.21.24315771 on medRxiv
Oct 22, 2024

Segmenting with Confidence: Uncertainty Quantification for Brain Tumor Imaging

This article has 8 authors:
1. Yassine Guennoun
2. Pierre Nedelec
3. Mark McArthur
4. Evan Bloch
5. Jinchi Wei
6. Leo Sugrue
7. Evan Calabrese
8. Andreas Rauschecker
This article has no evaluationsLatest version Jan 9, 2026
Improving Prostate Cancer Segmentation on T2-Weighted MRI Using Prostate Detection and Cascaded Networks

This article has 3 authors:
1. Nikolay Nefediev
2. Nikolay Staroverov
3. Roman Davydov
This article has no evaluationsLatest version Jan 19, 2026
Comparison of AI-Generated Radiology Impressions: A Multi-Stakeholder Evaluation

This article has 14 authors:
1. Sharang Phadke
2. Nivedita Suresh
3. Zachary Allen
4. Anjali Balagopal
5. Stephen Chan
6. Anish Shah
7. Megan Winter
8. Cesar Lam
9. Trevor Rose
10. Cyrillo Araujo
11. Abraham Ahmed
12. Iman Imanirad
13. Lincoln Berland
14. Andrew Del Gaizo
This article has no evaluationsLatest version Jan 14, 2026

Discuss this preprint

Listed in

Abstract

Introduction

Materials and methods

Results

Discussion

Article activity feed

Related articles

Segmenting with Confidence: Uncertainty Quantification for Brain Tumor Imaging

Improving Prostate Cancer Segmentation on T2-Weighted MRI Using Prostate Detection and Cascaded Networks

Comparison of AI-Generated Radiology Impressions: A Multi-Stakeholder Evaluation