MSPNet: Multiscale Scene Parsing Network

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

To address the challenges faced by existing scene parsing networks, such as high parameter counts, high computational complexity, and difficulties in achieving effective multiscalefeature representation, this paper proposes a lightweight multiscalescene parsing network called MSPNet. By adopting StarNet as the backbone network and embedding the efficient pixel localization attention (EPLA) module into PSPNet, the proposed network achieves a significant performance improvement. The EPLA module integrates two submodules: ELA and PagFM. The ELA module usesa dynamic weight allocation mechanism to achieve precise pixel-level feature localization, effectively reducing the computational overhead of attention mechanisms. Moreover, the PagFM module constructs a hierarchical pyramid feature fusion architecture, guiding and fusing features at different scales. Through their synergistic interaction, these two modules greatly enhance the network's ability to represent multiscale targets. Additionally, MSPNet leverages depthwise separable convolutions and channel reparameterization techniques to maintain a lightweight design while ensuring computational efficiency. The experimental results demonstrate that MSPNet performs exceptionally well on the Pascal VOC2012 validation set, achieving a 1.79% improvement in the mean intersection over union (mIoU) overPSPNet. Through the collaborative optimization of innovative modules, MSPNet excels in lightweight design, with GFLOPS and parameter counts comparable to those of the MobileNet series, providing an efficient solution for real-time semantic segmentation on mobile devices.

Article activity feed