Deep learning-based remote sensing data retrieval: A cross-modal framework
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In recent times, the acquisition of remote sensing (RS) images using various sensors has necessitated developing cross-sensor/modal data retrieval techniques as each sensor imagery captures different physical properties of complex land-use/land-cover patterns. Various types of imaging modalities are available nowadays, each having its advantages. Each sensor is capable of capturing a specific subset of information from a swept area. Hence, the paradigm of cross-modal content-based image retrieval has attracted an increasing amount of research interest. To this end, deep neural network-based frameworks have been proposed for handling cross-modal retrieval in RS for multi-modal, multi-media, and multi-resolution data. Effectively, the proposed models have simultaneous feature extractors for both modalities, which project the data into a shared latent feature space from which the retrieval is performed. However, the main challenge with the conventional retrieval approach is that we often do not have any query image sample of the target class at our disposal. There could also exist a considerable number of classes for which there are seldom any training data samples. Therefore, for such classes, a zero-shot learning (ZSL) strategy could be helpful. The ZSL approach aims to solve a task without receiving an example during the training phase. To this end, two efficient algorithms have been proposed that perform a sketch-based inter-modal retrieval task that is robust to the zero-shot framework. The network can handle an unseen class (i.e., a new class) sample obtained during the inference phase upon deployment of the network. Delving further into the problem of zero-shot cross-modal retrieval (ZS-CMR), an imperative problem that limits the overall performance and training of model in RS applications is the requirement of training the model with large annotated databases. One of the major bottlenecks of training such models is that they require a considerable amount of labelled data to train the model. Acquiring labelled data is both resource expensive and a labour-intensive task. This limits the training of such ZS-CMR models in practical applications.To this end, a new experimental protocol has been proposed for training models which can be trained on very few labelled samples and be robust enough to retrieve unseen class queries. This has been referred to as the few-zero-shot (FZS) protocol. A bi-level siamese network has been proposed to solve this task. The proposed network uses a semi-supervised training strategy for mapping the cross-domain samples of the two different sensors. Thus, in this thesis, the problems of uni-modal data retrieval, cross-modal data retrieval, zero-shot cross-modal data retrieval, and a novel problem of few-zero-shot cross-modal data retrieval have been investigated, and corresponding solutions have been proposed to solve these tasks using deep learning approaches.