Zero-shot adapter framework for cross-modal classification of remote sensing imagery

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Vision-language foundation models exhibit significant potential for open-world remote sensing applications; however, they face considerable challenges, including limited generic prompts, the scarcity of annotated datasets, and insufficient feature extraction. In response to these challenges, we introduce a novel zero-shot adapter framework for cross-modal imagery classification in remote sensing. This framework uniquely integrates three essential components: (1) an LLM-Augmented Prompt Generalization component is specifically designed for remote sensing classes and enhances the semantic depth of textual prompts by incorporating domain-specific knowledge from large language models (LLMs), thereby facilitating more contextually rich interpretations in remote sensing classification. (2) a Proxy-Enhanced Support Set Construction mechanism is developed to generate pseudo-label support sets, addressing the critical issue of limited annotated data and providing a robust mechanism for knowledge expansion. (3) a Multi-Granularity Feature Cache approach is introduced, which stores both local (patch-level) and global (scene-level) features. This module effectively combines feature caching with zero-shot CLIP predictions, thereby bridging the semantic gap between image and text domains in remote sensing. The synergistic interaction of these components enhances semantic grounding through LLM-augmented prompts and proxy support sets, while feature caching and proxy learning collaboratively address insufficient feature extraction. The proposed framework exhibits superior efficacy in resource-constrained environments. Experimental evaluations across five benchmark datasets indicate promising zero-shot and few-shot prediction performance, showing an improvement over existing cross-modal methodologies.

Article activity feed