VMUnet-MSADI: Visual Mamba UNet Fusion Multi-Scale Attention and Detail Infusion for Unsound Corn Kernels Segmentation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Corn seed breeding is a global issue, and has attracted great attention in recent years. Deploying autonomous robots for corn kernel recognition and classification has great potential in terms of constructing environment friendly agriculture, and saving manpower. Existing segmentation methods that utilize U-shaped architectures typically operate by processing images in discrete pixel-based segments. This approach often overlooks the finer pixel-level structural details within these segments, leading to models that struggle to preserve the continuity of target edges effectively. In this paper, we propose a new framework for corn seed image segmentation, called VMUnet-MSADI, which aims to integrate MSADI module into the encoder and decoder of the VMUnet architecture. Our VMUnet-MSADI model benefits from self-attention computation in VMUnet and multiscale coding to efficiently model non-local dependencies and multiscale contexts to improve the segmentation quality of different images. Unlike previous Unet-based improvement schemes, the proposed VMUnet-MSADI adopts a multiscale convolutional attention module coding mechanism at the depth level and an efficient multiscale deep convolutional decoder at the spatial level to extract coarse-grained features and fine-grained features at different semantic scales and effectively avoid the loss of information at the target boundary to improve the quality and accuracy of target segmentation. In addition, we introduce a Visual State Space (VSS) block to capture a wide range of contextual information and a Detail Infusion Block (DIB) to enhance the fusion of low-level and high-level features, which further fills in the remote contextual information during the up-sampling process. Comprehensive experiments were conducted on open-source datasets and the results demonstrate that the VMUnet-MSADI model excels in the task of corn kernel segmentation. The model achieved a segmentation accuracy of 95.96%, surpassing the leading method by 0.9%. Compared to other segmentation models, our method exhibits superior performance in both accuracy and loss metrics. Extensive comparative experiments conducted on various benchmark datasets further substantiate that our approach outperforms the state-of-the-art models. Code, pre-trained models and data processing protocols are available at https://github.com/corbining/VMUnet-MSADI