Overcoming Data Heterogeneity in Breast Ultrasound: A ResNet50V2-Based Solution with Enhanced Class Balancing
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In breast cancer diagnosis, the accurate and efficient analysis of ultrasound images is essential, particularly for distinguishing benign from malignant tumors. Ultrasound imaging is a widely used, non-invasive diagnostic tool that provides valuable insights into tumor characteristics. This study presents a deep learning approach for breast cancer classification using transfer learning (TL) models across three distinct datasets: BUSI, BUSI-UCLM, and GDPH&SYSUCC. A significant challenge addressed was the inherent class imbalance within these datasets. To mitigate this, a comprehensive evaluation of various data sampling techniques was conducted, with Random Over Sampling (ROS) emerging as the most effective method for balancing the data. Among multiple TL models assessed, the ResNet50V2 architecture consistently demonstrated superior performance across all metrics. When trained and validated on the individual 1 datasets, the ResNet50V2 model achieved an accuracy of 95.44%, an F1-score of 95.25%, and an AUC of 99.21% on the BUSI dataset; 97.22% accuracy, 97.32% F1-score, and 99.90% AUC on the BUSI-UCLM dataset; and 95.72% accuracy, 95.72% F1-score, and 98.37% AUC on the GDPH&SYSUCC dataset. Following this individual evaluation, a combined dataset was created, which consisted of 3971 images distributed across benign (1497), malignant (1819), and normal (247) classes. On this combined, more challenging dataset, the ROS-enhanced ResNet50V2 model maintained its strong performance, achieving a final accuracy of 93.96%, an F1-score of 93.99%, and an AUC of 98.70%. These results highlight the efficacy of using ROS to address class imbalance and the robustness of ResNet50V2 as a transfer learning backbone for breast cancer classification across heterogeneous ultrasound datasets. The findings underscore the potential of this approach to enhance diagnostic accuracy and support clinical decision-making.