Does Normalisation Approach Affect Generalisability? A Comparison of Methods for 3D Knee MRI Segmentation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Robust out-of-the-box performance is essential for the clinical deployment of deep learning models in medical imaging. An important but underexplored factor affecting model generalisability is intensity normalisation, particularly for magnetic resonance imaging (MRI), where image intensities vary across scanners and protocols. In this study, we systematically compared seven normalisation methods and their impact on the performance of a 3D U-Net model for meniscus segmentation from knee MRI. The methods included standard scaling approaches, histogram-based techniques, and a novel tissue-specific method using Gaussian Mixture Model (GMM) fitting based on brain MRI white stripe normalisation. Models were trained on the IWOAI 2019 dataset and evaluated on both internal and external test sets (SKM-TEA) to assess generalisability. Performance was similar internally but differences were significant on external data, with Z-score, Ny\'{u}l histogram matching, and CLAHE showing greater robustness than other methods. The GMM method performed well internally but was less effective on external data due to differences in intensity profiles across datasets. Overall, while normalisation methods provided some benefit in mitigating domain shift, the differences in performance were small compared to the significant drop observed between datasets, indicating that more work is needed to address the challenges posed by domain shift in medical imaging.