Label Noise in Pathological Segmentation is Overlooked, Leading to Potential Overestimation of Artificial Intelligence Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Artificial intelligence (AI) has transformed medical imaging, driving advancements in radiology and endoscopy. Semantic segmentation, a pixel-level technique crucial for delineating pathological features, has become a cornerstone of digital pathology. Pathology segmentation AI models are often trained using annotations generated by pathologists. Despite the meticulous care typically exercised, these annotations frequently contain empirical label noise. However, the specific types of label noise in pathology data and their impact on AI model training remain inadequately explored. This study systematically investigated the effects of label noise on the performance of pathology segmentation models. Using publicly available datasets and a breast cancer semantic segmentation dataset, modules were developed to simulate four types of artificial label noise at varying intensity levels. These datasets were used to train deep learning models with encoder-decoder architectures, and their performance was evaluated using metrics such as the Dice coefficient, precision, recall, and intersection over union. The results indicated that models were highly susceptible to overfitting label noise, particularly boundary-dependent noise such as dilation and shrinkage. Discrepancies were identified between apparent performance scores obtained under real-world conditions and true performance scores derived using clean test data. This overestimation risk was most pronounced for datasets containing boundary-altering noise. Furthermore, random combinations of noise types and levels significantly impaired model generalization. This study underscores the critical importance of addressing label noise in pathology datasets. It is proposed that future efforts focus on developing standardized methods for quantifying and mitigating label noise, along with creating robust benchmarks using noise-inclusive datasets. Enhancing annotation quality and addressing label noise can improve the reliability and generalizability of AI in pathology, facilitating broader clinical adoption.