Reliable CNN Evaluation in Medical Imaging via Variance-Aware Cross-Validation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Reliable evaluation and generalizable hyperparameter selection remain critical challenges in deep learning–based medical image analysis, particularly under limited, imbalanced, and heterogeneous data conditions. This paper proposes a Variance-Aware K-Fold Cross-Validation framework for robust hyperparameter optimization of convolutional neural networks (CNNs). Unlike conventional single-run or mean-based cross-validation strategies, the proposed framework introduces a variance-regularized objective function that jointly maximizes mean validation performance while explicitly penalizing fold-to-fold variability, thereby promoting stability and generalization. The approach is systematically integrated with Bayesian optimization and Tree-structured Parzen Estimator (TPE) methods and evaluated across multiple optimization libraries, demonstrating its library-agnostic applicability. Extensive experiments under varying K-Fold configurations show that variance-aware optimization consistently mitigates the optimistic bias of single-run evaluations and identifies hyperparameter configurations with superior robustness and reproducibility. A theoretical analysis further establishes variance-aware generalization error bounds and a reliability ordering principle, providing formal justification for the proposed optimization criterion. Empirical validation on a multi-class breast ultrasound imaging dataset confirms improved performance stability and reduced variance across folds. Overall, the proposed framework offers a principled, reproducible, and architecture-independent evaluation strategy that enhances the reliability of CNN-based medical imaging systems and is readily extensible to other data-limited clinical applications.