Assessing Pneumonia in Chest X-ray Images Using a Modified VGG16 Model: A Comparative Study of Data Sampling Techniques
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Imbalanced datasets pose significant challenges in deep learning-based medical image classification, often leading to biased predictions. This study evaluates the impact of various data sampling techniques and Bayesian hyperparameter optimization on the performance of the VGG16 model to diagnose pneumonia from the Chest X-ray Pneumonia dataset. Methods: Six different approaches were assessed: VGG16 with Random Under Sampling (RUS) VGG16 with Equal Sampling VGG16 with Random Over Sampling (ROS) VGG16 with Synthetic Minority Over-sampling Technique (SMOTE) VGG16 with Adaptive Synthetic Sampling (ADASYN) VGG16 with Bayesian Hyperparameter Optimization Each technique was implemented to balance the dataset, and model performance was evaluated based on training accuracy. Results: The training accuracy varied across different sampling techniques, with Random Over Sampling (ROS) achieving the highest at 99.85%, followed by Random Under Sampling (RUS) (99.34%) and ADASYN (99.19%).SMOTE and Bayesian Hyperparameter Optimization resulted in 98.31% and 98.53%, respectively, while Equal Sampling had the lowest training accuracy at 93.86%. These results indicate that oversampling methods, particularly ROS and ADASYN, significantly enhance model learning, while Bayesian optimization offers a stable alternative without surpassing the best-performing oversampling techniques. Conclusion: This comparative analysis highlights the impact of different data-balancing strategies on pneumonia detection using deep learning. Oversampling methods, especially ROS and ADASYN, proved highly effective in improving training accuracy, whereas Bayesian hyperparameter optimization provided stability but did not outperform the top oversampling techniques. Selecting an appropriate sampling strategy is crucial for enhancing the performance of deep learning models in medical image classification.