VMUnet-MSADI： Visual Mamba UNet Fusion Multi-Scale Attention and Detail Infusion for Unsound Corn Kernels Segmentation

Kuibin Zhao
Qinghui Zhang
Chenxia Wan
Quan Pan
Yao Qin

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Corn seed breeding is a global issue, and has attracted great attention in recent years. Deploying autonomous robots for corn kernel recognition and classification has great potential in terms of constructing environment friendly agriculture, and saving manpower. Existing segmentation methods that utilize U-shaped architectures typically operate by processing images in discrete pixel-based segments. This approach often overlooks the finer pixel-level structural details within these segments, leading to models that struggle to preserve the continuity of target edges effectively. In this paper, we propose a new framework for corn seed image segmentation, called VMUnet-MSADI, which aims to integrate MSADI module into the encoder and decoder of the VMUnet architecture. Our VMUnet-MSADI model benefits from self-attention computation in VMUnet and multiscale coding to efficiently model non-local dependencies and multiscale contexts to improve the segmentation quality of different images. Unlike previous Unet-based improvement schemes, the proposed VMUnet-MSADI adopts a multiscale convolutional attention module coding mechanism at the depth level and an efficient multiscale deep convolutional decoder at the spatial level to extract coarse-grained features and fine-grained features at different semantic scales and effectively avoid the loss of information at the target boundary to improve the quality and accuracy of target segmentation. In addition, we introduce a Visual State Space (VSS) block to capture a wide range of contextual information and a Detail Infusion Block (DIB) to enhance the fusion of low-level and high-level features, which further fills in the remote contextual information during the up-sampling process. Comprehensive experiments were conducted on open-source datasets and the results demonstrate that the VMUnet-MSADI model excels in the task of corn kernel segmentation. The model achieved a segmentation accuracy of 95.96%, surpassing the leading method by 0.9%. Compared to other segmentation models, our method exhibits superior performance in both accuracy and loss metrics. Extensive comparative experiments conducted on various benchmark datasets further substantiate that our approach outperforms the state-of-the-art models. Code, pre-trained models and data processing protocols are available at https://github.com/corbining/VMUnet-MSADI

Version published to 10.21203/rs.3.rs-5170853/v1 on Research Square
Nov 4, 2024

SCFI-ESeg: Enhancing Semantic Segmentation with Spatial and Content Feature Integration

This article has 5 authors:
1. Ning Li
2. Xudong Zhang
3. Bo Li
4. Baohua Yuan
5. Gaochao Yang
This article has no evaluationsLatest version Oct 29, 2024
JND-Based Illumination Compensationand DoG-Mask RCNN Optimization forAccurate Radish Image Segmentation

This article has 1 author:
1. 莫冠宗
This article has no evaluationsLatest version Nov 4, 2024
RAVL: A Region Attention Yolo with Two-Stage Training for Enhanced Object Detection

This article has 3 authors:
1. Weiwen Cai
2. Huiqian Du
3. Min Xie
This article has no evaluationsLatest version Nov 4, 2024

Listed in

Abstract

Article activity feed

Related articles

SCFI-ESeg: Enhancing Semantic Segmentation with Spatial and Content Feature Integration

JND-Based Illumination Compensationand DoG-Mask RCNN Optimization forAccurate Radish Image Segmentation

RAVL: A Region Attention Yolo with Two-Stage Training for Enhanced Object Detection