Ground-Level Building Damage Segmentation Using a Patch-Based Approach with Global and Positional Embeddings

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In the context of ongoing urbanization and warfare, accurate building damage detection is a crucial challenge. While most segmentation approaches rely on overhead satellite imagery, the task from ground-level perspectives remains underexplored. To address this gap, we introduce a dataset of 290 side-view images of Ukrainian buildings, manually annotated with six classes. We propose a deep learning-based segmentation system that divides images into fixed-size patches and augments them with global ConvNeXt-Large image embeddings and positional embeddings for encoding spatial location. The backbone follows a modified U-Net design, where the encoder is replaced by ResNet-50, SwinV2-Large, ConvNeXt-Large, Yolo11-seg, or DINOv2. We also evaluate a modified SegFormer-b5 for comparison.. We also study the effect of Felzenszwalb superpixel post-processing. Results show that U-Net with DINOv2 encoder and embeddings achieves the best performance on all six classes (IoU = 0.4711; F1 = 0.7462), while U-Net with ResNet-50 encoder and embeddings performs best on the three-class task (IoU = 0.7644; F1 = 0.8876). Embeddings strongly contributed to these gains: ResNet improved by +5.28 pp F1 and +7.81 pp IoU in the 3-class task, and DINOv2 by +4.71 pp F1 and +4.65 pp IoU in the 6-class task.

Article activity feed