MSP-Net: Multi-Scale Spectrum Pyramid Network for Robust Synthetic Aperture Radar Automatic Target Recognition
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) remains challenging due to speckle noise, aspect-angle variation, and the loss of fine scattering cues in conven-tional deep-learning pipelines. Spatial-domain CNNs primarily extract geometric struc-ture but overlook high-frequency information critical for distinguishing small or spectrally similar targets, while frequency-only methods such as FFTNet fail to leverage spatial con-text and multi-scale spectral variation. To address these limitations, this study proposes the Multi-Scale Spectrum Pyramid Network (MSP-Net), which decomposes SAR images into low-, mid-, and high-frequency components via two-dimensional Fourier transforms with band-pass filtering and processes each band through dual convolutional branches equipped with predefined and learnable spectral filters. The resulting features are fused using attention-based, MLP-based, or transformer-based integration mechanisms. Expe-riments on two MSTAR-based benchmark datasets (11-class and 8-class) demonstrate that MSP-Net substantially outperforms spatial-only CNNs and single-scale frequen-cy-domain models. In the 11-class setting, MSP-Net improves accuracy by 13–14% (up to 95%) and achieves near-perfect ROC separability (AUC ≈ 1.0) with reliable calibration (ECE < 0.02). On the reduced 8-class dataset, the best MSP-Net variant achieves 99.9% ac-curacy and consistent per-class F1-scores. Ablation studies confirm the critical role of multi-scale spectral decomposition and adaptive fusion in improving recognition of small and spectrally similar targets such as BMP2, BTR60, and BTR70. These results highlight the effectiveness of frequency-aware, multi-scale learning for robust and interpretable SAR ATR.