Deep learning-based remote sensing data retrieval: A cross-modal framework

Ushasi Chaudhuri

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In recent times, the acquisition of remote sensing (RS) images using various sensors has necessitated developing cross-sensor/modal data retrieval techniques as each sensor imagery captures different physical properties of complex land-use/land-cover patterns. Various types of imaging modalities are available nowadays, each having its advantages. Each sensor is capable of capturing a specific subset of information from a swept area. Hence, the paradigm of cross-modal content-based image retrieval has attracted an increasing amount of research interest. To this end, deep neural network-based frameworks have been proposed for handling cross-modal retrieval in RS for multi-modal, multi-media, and multi-resolution data. Effectively, the proposed models have simultaneous feature extractors for both modalities, which project the data into a shared latent feature space from which the retrieval is performed. However, the main challenge with the conventional retrieval approach is that we often do not have any query image sample of the target class at our disposal. There could also exist a considerable number of classes for which there are seldom any training data samples. Therefore, for such classes, a zero-shot learning (ZSL) strategy could be helpful. The ZSL approach aims to solve a task without receiving an example during the training phase. To this end, two efficient algorithms have been proposed that perform a sketch-based inter-modal retrieval task that is robust to the zero-shot framework. The network can handle an unseen class (i.e., a new class) sample obtained during the inference phase upon deployment of the network. Delving further into the problem of zero-shot cross-modal retrieval (ZS-CMR), an imperative problem that limits the overall performance and training of model in RS applications is the requirement of training the model with large annotated databases. One of the major bottlenecks of training such models is that they require a considerable amount of labelled data to train the model. Acquiring labelled data is both resource expensive and a labour-intensive task. This limits the training of such ZS-CMR models in practical applications.To this end, a new experimental protocol has been proposed for training models which can be trained on very few labelled samples and be robust enough to retrieve unseen class queries. This has been referred to as the few-zero-shot (FZS) protocol. A bi-level siamese network has been proposed to solve this task. The proposed network uses a semi-supervised training strategy for mapping the cross-domain samples of the two different sensors. Thus, in this thesis, the problems of uni-modal data retrieval, cross-modal data retrieval, zero-shot cross-modal data retrieval, and a novel problem of few-zero-shot cross-modal data retrieval have been investigated, and corresponding solutions have been proposed to solve these tasks using deep learning approaches.

Version published to 10.31237/osf.io/xeygc_v1 on OSF Preprints
Nov 21, 2025

Probabilistic von Mises–Fisher Representation Learning forFew-Shot Remote Sensing Scene Classification

This article has 5 authors:
1. Zhong Ji
2. Ci Liu
3. Hongsheng Zhang
4. Chen Tang
5. Yanwei Pang
This article has no evaluationsLatest version Jan 7, 2026
Transparent and Collaborative AI for Segmentation-Based Hyperspectral Image Classification

This article has 1 author:
1. Yashir Arafat
This article has no evaluationsLatest version Feb 3, 2026
LGD-DeepLabV3+: An Enhanced Framework for Remote Sensing Semantic Segmentation via Multi-Level Feature Fusion and Global Modeling

This article has 5 authors:
1. Xin Wang
2. Xu Liu
3. Adnan Mahmood
4. Yaxin Yang
5. Xipeng Li
This article has no evaluationsLatest version Jan 21, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Probabilistic von Mises–Fisher Representation Learning forFew-Shot Remote Sensing Scene Classification

Transparent and Collaborative AI for Segmentation-Based Hyperspectral Image Classification

LGD-DeepLabV3+: An Enhanced Framework for Remote Sensing Semantic Segmentation via Multi-Level Feature Fusion and Global Modeling