Vision Transformer based Damage Assessment from Post-Disaster Satellite Imagery: An Applied Study on Hurricane Harvey

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The timely and accurate evaluation of building damage is vital for post-disaster response. This study evaluates the efficacy of Vision Transformers (ViT-B32) compared to a baseline Convolutional Neural Network (EfficientNet-B0) for the binary classification of damaged versus undamaged buildings, utilizing 128×128 RGB satellite imagery from Hurricane Harvey. As opposed to CNNs that encode local features, ViT structures rely on self-attention mechanisms for capturing global spatial relationships, a property that is crucial in spotting intricate damage patterns in visually noisy disaster sites. Experiments show that the ViT model outperforms the CNN baseline in terms of classification accuracy (97.85% vs 96.90%) and prove to be more robust on unbalanced data as shown with F1-score and AUC than respective values of close competitors. Furthermore, the study highlights the model's interpretability: we generate attention heatmaps that visualize the specific image regions driving the classification decisions. These visualizations provide actionable insights by precisely localizing structural damage, thereby offering a valuable tool for prioritizing recovery efforts in disaster management workflows.

Article activity feed