Automatic Pruning and Quality Assurance of Object Detection Datasets for Autonomous Driving
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large amounts of high-quality data is required for the training of Artificial Intelligence (AI) models, which are indeed cumbersome to curate and perform quality assurance via human intervention. Moreover, models trained using erroneous data (human errors, data faults) can cause significant problems in real-world applications. This paper proposes an automated cleaning framework and quality assurance strategy for 2D object detection datasets. The proposed cleaning method was designed according to the ISO/IEC 25012 data quality standards, and uses multiple AI models to filter anomalies and missing data. In addition, it balances out the statistical unevenness in the dataset, such as the class distribution and object size distribution. Thereby ensuring the quality of the training dataset and examining the relationship between the amount of data required for enhanced performance in terms of detection. The experiments were conducted using popular datasets for autonomous driving, including KITTI, Waymo, nuScenes and publicly available datasets from South Korea. An automated data cleaning framework was employed to remove anomalous and redundant data, resulting in a reliable dataset for training. The automated data pruning and assurance system demonstrated the ability to substantially decrease the time and resources needed for manual data inspection.