LVC2-DViT: Landview Creation for Landview Classification
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Remote sensing land-cover classification is impeded by limited annotated data and pronounced geometric distortion, hindering its value for environmental monitoring and land planning. We introduce LVC2‑DViT (Landview Creation for Landview Classification with Deformable Vision Transformer), an end‑to‑end framework evaluated on five Aerial Image Dataset (AID) scene types, including Beach, Bridge, Pond, Port and River. LVC2‑DViT fuses two modules: (i) a data creation pipeline that converts ChatGPT-4o-generated textual scene descriptions into class‑balanced, high-fidelity images via Stable Diffusion, and (ii) DViT, a deformation‑aware Vision Transformer dedicated to land‑use classification whose adaptive receptive fields more faithfully model irregular landform geometries. Without increasing model size, LVC2‑DViT improves Overall Accuracy by 2.13 percentage points and Cohen’s Kappa by 2.66 percentage points over a strong vanilla ViT baseline, and also surpasses FlashAttention variant. These results confirm the effectiveness of combining generative augmentation with deformable attention for robust land‑use mapping. The project is available at here.