A Method for Sensitivity Analysis of Automatic Contouring Algorithms Across Different MRI Contrast Weightings Using SyntheticMR
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Currently, a majority of institution-specific automatic MRI-based contouring algorithms are trained, tested, and validated on one contrast weighting (i.e., T2-weighted), however their actual performance within this contrast weighting (i.e., across different repetition times, TR, and echo times, TE) is under-investigated and poorly understood. As a result, external institutions with different scan protocols for the same contrast weighting may experience sub-optimal performance.
Purpose
The purpose of this study was to develop a method to evaluate the robustness of automatic contouring algorithms to varying MRI contrast weightings.
Methods
One healthy volunteer and one patient was scanned using SyntheticMR on the MR-Simulation device. The parotid and submandibular glands in these subjects were contoured using an automatic contouring algorithm trained on T2-weighted MRIs. For ground truth manual contours, two radiation oncology residents and one pre-resident physician were recruited and their STAPLE consensus was determined. A total of 216 different MRI TR and TE combinations were simulated across T1-, T2-, and PD-weighted contrast ranges using SyntheticMR’s post-processing software, SyMRI. Comparisons between automatic contouring algorithm contours and the ground truth were determined using the Dice similarity coefficient (DSC) and 95 th percentile Hausdorff distance (HD95).
Results
Notable differences in the automatic contouring model’s performance were seen across the contrast-weighted range, even within the T2-weighted range. Further, some models even performed as well or better across subsets of the T1-weighted range. The PD-weighted range saw the worst performance. The range of discrepancy in DSC and HD95 exceeded 0.2 and 3.66 mm, respectively, in some structures. In the T2-weighted contrast region where the model was trained, 100%, 40%, 24%, and 57% for the DSC in the left parotid, right parotid, left submandibular, and right submandibular gland, respectively, exceeded interobserver variability.
Conclusions
This study demonstrates the variable performance of MRI-based automatic contouring algorithms across varying TR and TE combinations. This methodology could be applied in future studies as a method for evaluating model sensitivity, out of distribution detection ability, and performance drift.