EfficientCrackFusion: An EfficientNet-ViT Hybrid with Cross-Attention Fusion for Concrete Crack Detection
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The essential process of detecting cracks in both concrete and pavement structures helps preserve the integrity of infrastructure systems, but the existing manual inspection techniques face challenges because they operate at slow speeds and introduce personal bias. Deep learning methods enable automatic feature extraction for crack detection through convolutional neural networks, but these networks face limitations because their receptive fields can only detect short-range patterns. Vision Transformers improve upon this limitation through their self-attention mechanism, but their independent operation requires extensive computational resources. The EfficientCrackFusion presents a novel hybrid architecture uses a Cross-Attention Fusion Gate and Squeeze-and-Excitation block to combine EfficientNet-B0 and ViT-Base sensors for efficient local and global feature representation. The researchers used perceptual hashing together with FAISS similarity grouping to create a data splitting method which maintains group awareness and prevents data leaks to achieve unbiased assessment. The testing on 40,000 images achieved 99.11\% accuracy with an F1-score of 0.9947, which surpassed all existing models while maintaining a compact design suitable for outdoor structural health assessments.