Precise prostate contours: setting the bar and meticulously evaluating AI performance
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Introduction
Evaluation of artificial intelligence (AI) algorithms for prostate segmentation is challenging because ground truth is lacking. We aimed to (1) create a reference standard dataset with precise prostate contours by expert consensus and (2) evaluate various AI tools against this standard.
Materials and methods
We obtained prostate MRI cases from six institutions from the Quantitative Prostate Imaging Consortium. A panel of four experts (two genitourinary radiologists, two prostate radiation oncologists) meticulously developed consensus prostate segmentations on axial T 2 -weighted series. We evaluated the performance of six AI tools (three commercially available, three academic) using Dice scores, distance from reference contour, and volume error.
Results
The panel achieved consensus prostate segmentation on each slice of all 68 patient cases included in the reference dataset. We present two patient examples to serve as contouring guides. Depending on the AI tool, median Dice scores (across patients) ranged from 0.80 to 0.94 for whole prostate segmentation.
For a typical (median) patient, AI tools had a mean error over the prostate surface ranging from 1.3 to 2.4 mm. They maximally deviated 3.0 to 9.4 mm outside the prostate and 3.0 to 8.5 mm inside the prostate for a typical patient. Error in prostate volume measurement for a typical patient ranged from 4.3% to 31.4%.
Discussion
We established an expert consensus benchmark for prostate segmentation. The best-performing AI tools have typical accuracy greater than that reported for radiation oncologists using CT scans (most common clinical approach for radiotherapy planning). Physician review remains essential to detect occasional major errors.