LVC2-DViT: Landview Creation for Landview Classification

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Remote sensing land-cover classification is impeded by limited annotated data and pronounced geometric distortion, hindering its value for environmental monitoring and land planning. We introduce LVC2‑DViT (Landview Creation for Landview Classification with Deformable Vision Transformer), an end‑to‑end framework evaluated on five Aerial Image Dataset (AID) scene types, including Beach, Bridge, Pond, Port and River. LVC2‑DViT fuses two modules: (i) a data creation pipeline that converts ChatGPT-4o-generated textual scene descriptions into class‑balanced, high-fidelity images via Stable Diffusion, and (ii) DViT, a deformation‑aware Vision Transformer dedicated to land‑use classification whose adaptive receptive fields more faithfully model irregular landform geometries. Without increasing model size, LVC2‑DViT improves Overall Accuracy by 2.13 percentage points and Cohen’s Kappa by 2.66 percentage points over a strong vanilla ViT baseline, and also surpasses FlashAttention variant. These results confirm the effectiveness of combining generative augmentation with deformable attention for robust land‑use mapping. The project is available at here.

Article activity feed