FusionFormer-X: Hierarchical Self-Attentive Multimodal Transformer for HSI-LiDAR Remote Sensing Scene Understanding

Aria Taukiri
Emily Marwood
Liam Raukawa

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The fusion of complementary modalities has become a central theme in remote sensing (RS), particularly in leveraging Hyperspectral Imaging (HSI) and Light Detection and Ranging (LiDAR) data for more accurate scene classification. In this paper, we introduce \textbf{FusionFormer-X}, a novel transformer-based architecture that systematically unifies multi-resolution heterogeneous data for RS tasks. FusionFormer-X is specifically designed to address the challenges of modality discrepancy, spatial-spectral alignment, and fine-grained feature representation. First, we embed convolutional tokenization modules to transform raw HSI and LiDAR inputs into semantically rich patch embeddings, preserving spatial locality. Next, we propose a Hierarchical Multi-Scale Multi-Head Self-Attention (H-MSMHSA) mechanism, which performs cross-modal interaction in a coarse-to-fine manner, enabling robust alignment between high-spectral-resolution HSI and lower-resolution spatial LiDAR data. We validate our framework on public RS benchmarks including Trento and MUUFL, demonstrating its superior classification performance over current state-of-the-art multimodal fusion models. These results underscore the potential of FusionFormer-X as a foundational backbone for high-fidelity multimodal remote sensing understanding.

Version published to 10.20944/preprints202506.1177.v1
Jun 13, 2025

SDRFPT-Net: A Spectral Dual-Stream Recursive Fusion Network for Multispectral Object Detection

This article has 6 authors:
1. Peida Zhou
2. Xiaoyong Sun
3. Bei Sun
4. Runze Guo
5. Zhaoyang Dang
6. Shaojing Su
This article has no evaluationsLatest version May 15, 2025
SDRFPT-Net: A Spectral Dual-Stream Recursive Fusion Network for Multispectral Object Detection

This article has 6 authors:
1. Peida Zhou
2. Xiaoyong Sun
3. Bei Sun
4. Runze Guo
5. Zhaoyang Dang
6. Shaojing Su
This article has no evaluationsLatest version May 15, 2025
MA-YOLO: Multi-Scale Attention-Enhanced YOLO for Object Detection in Remote Sensing Images

This article has 3 authors:
1. TingSong Sun
2. JianMin Wang
3. Jianyu Sun
This article has no evaluationsLatest version May 2, 2025

Listed in

Abstract

Article activity feed

Related articles

SDRFPT-Net: A Spectral Dual-Stream Recursive Fusion Network for Multispectral Object Detection

SDRFPT-Net: A Spectral Dual-Stream Recursive Fusion Network for Multispectral Object Detection

MA-YOLO: Multi-Scale Attention-Enhanced YOLO for Object Detection in Remote Sensing Images