FMT: Foundation Model-based Transformer for Remote Sensing Change Detection
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Change detection is a popular topic in the field of remote sensing, aiming to detect significant changes between bi-temporal images. With the technological advancements, advanced satellites capture more complex geographical information, making change detection more challenging. Existing models often use convolutional networks and Transformers to learn changes between bi-temporal images, but they often fail to fully utilize the knowledge and scalability of the foundation model, neglecting the importance of filtering invariant background information, which leads to unfiltered tokens interfering with model performance. In this work, we demonstrate the advantages of the foundation model and the necessity of token filtering. We propose a Foundation Model-based Transformer for Remote Sensing Change Detection (FMT). We introduce a collaborative feature extraction module that utilises a modified ResNet18 and a frozen foundation model. We also propose a multi-scale cross-axis attention fusion module that combines general features extracted by the foundation model with ResNet18 backbone network features. Additionally, we design an anchor token filtering module that uses algorithms such as TVConv, k-means, and top-k to filter change region anchor tokens based on a fuzzy prediction map and background information. Subsequently, the relationships between tokens are learned through a self-attention mechanism, and finally, a dual cross-attention module is used to interact with original and enhanced features, generating a prediction map with a convolutional decoder. The FMT was evaluated on the WHU‑CD, LEVIR‑CD, and DSIFN datasets, demonstrating superior performance compared to existing models.