A Comparative Study of an AI Model’s Robustness to Synthetic Data in Solving the Problem of Color Image Classification
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study examines the impact of data augmentation on machine learning perfor-mance, focusing on how synthetic data influences various neural network architec-tures. Common issues such as limited data, class imbalance, and poor coverage often lead to low model metrics, and data augmentation is frequently used to address these problems. The research aims to identify the optimal proportion of synthetic data, assess its effects across different architectures, and analyze the impact of augmenting only specific classes in a multi-class medical image classification task. Twelve widely used architectures were selected for the experiments, including classical convolutional networks, visual transformers, and the hybrid ConvNeXt model. Results showed that no universal optimal augmentation ratio exists, as model robust-ness to synthetic data varies, even within the same architecture family. Transformer and hybrid models demonstrated greater stability, while convolutional networks exhibited inconsistent behavior, likely due to higher sensitivity to data bias.