Impact of Data Quality on CNN-Based Sewer Defect Detection

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Sewer pipelines are essential urban infrastructure that play a key role in sanitation and disaster prevention. Regular condition assessments are necessary to detect defects early and determine optimal maintenance timing. However, traditional visual inspection using CCTV footage is time-consuming, labor-intensive, and dependent on subjective human judgment. To address these limitations, this study proposes an automated defect classification model using a convolutional neural network (CNN). A large-scale public dataset of approximately 470,000 sewer images provided by AI-Hub was used for training. The model was designed to classify non-defect and three major defect categories. Based on the ResNet50 architecture, the model incorporated dropout and L2 regularization to prevent overfitting. Experimental results showed the highest accuracy of 92.75% at a dropout rate of 0.2 and a regularization coefficient of 0.01. Further analysis revealed that mislabeled, redundant, or obscured images within the dataset negatively impacted model performance. Additional experiments quantified the impact of data quality on accuracy, emphasizing the importance of proper dataset curation. This study provides practical insights into optimizing data-driven approaches for automated sewer defect detection and high-performance model development.

Article activity feed