PUNet: A Lightweight Parallel U-Net Architecture Integrating Mamba-CNN for High-Precision Image Segmentation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Real-time high-precision image segmentation on mobile and edge devices remains challenging due to the limited ability of traditional convolutional networks to model long-range dependencies and their high computational cost at high resolutions. We propose PUNet, a lightweight parallel U-Net variant that integrates depthwise separable convolutions (DSConv) with a structured state-space Mamba module in a dual-path encoder. The core component, the Parallel Structured State-Space Encoder (PSSSE), employs two branches to efficiently capture local spatial features (via DSConv) and model global semantic dependencies (via the Visual Mamba Layer), while a Squeeze-and-Excitation skip connection adaptively fuses multi-scale features. With only 0.26 M parameters and linear computational complexity, PUNet enables real-time inference on resource-constrained platforms. Experiments on the CamVid and CRACK500 datasets demonstrate superior performance, achieving validation Dice scores of 0.9208 and 0.7902, and mean Intersection-over-Union (IoU) of 0.8643 and 0.6612, respectively, significantly outperforming other lightweight models.