Ground-Level Building Damage Segmentation Using a Patch-Based Approach with Global and Positional Embeddings
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In the context of ongoing urbanization and warfare, accurate building damage detection is a crucial challenge. While most segmentation approaches rely on overhead satellite imagery, the task from ground-level perspectives remains underexplored. To address this gap, we introduce a dataset of 290 side-view images of Ukrainian buildings, manually annotated with six classes. We propose a deep learning-based segmentation system that divides images into fixed-size patches and augments them with global ConvNeXt-Large image embeddings and positional embeddings for encoding spatial location. The backbone follows a modified U-Net design, where the encoder is replaced by ResNet-50, SwinV2-Large, ConvNeXt-Large, Yolo11-seg, or DINOv2. We also evaluate a modified SegFormer-b5 for comparison.. We also study the effect of Felzenszwalb superpixel post-processing. Results show that U-Net with DINOv2 encoder and embeddings achieves the best performance on all six classes (IoU = 0.4711; F1 = 0.7462), while U-Net with ResNet-50 encoder and embeddings performs best on the three-class task (IoU = 0.7644; F1 = 0.8876). Embeddings strongly contributed to these gains: ResNet improved by +5.28 pp F1 and +7.81 pp IoU in the 3-class task, and DINOv2 by +4.71 pp F1 and +4.65 pp IoU in the 6-class task.