GH-UNet: Group-wise Hybrid Convolution-VIT for Robust Medical Image Segmentation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Medical image segmentation is essential for accurate diagnosis and effective treatment. Although U-Net-based architectures have shown outstanding performance in this domain, they are limited in effectively capturing long-range contextual relationships within complex anatomical structures. Consequently, this study presents GH-UNet, a G roup-wise H ybrid Convolution-VIT model under the U-Net framework for robust medical image segmentation. GH-UNet is designed to efficiently capture local details and long-range dependencies while accurately delineating target boundaries in complex medical scenarios. First, we designed a hybrid convolution-vision Transformer (VIT) encoder to effectively capture both local and long-range dependencies. Second, a Group-wise Dynamic Gating (GDG) component was developed to dynamically adjust feature weights, thereby enhancing the effectiveness of feature expression. Third, we implemented a cascade mechanism in the decoder to integrate information across various scales. Notably, both the hybrid convolution-VIT encoder and the GDG component are modular, allowing for broad application across CNN or VIT architectures. Experimental results demonstrate that GH-UNet performs exceptionally well across five public and one private 2D/3D medical image segmentation datasets. For example, on the ISIC2016 dataset, GH-UNet outperformed the H2Former model, improving the DICE and IOU scores by 1.37% and 1.94% respectively, while requiring only 38% of the parameters and 49.61% of the FLOPs compared to H2Former. The source code is freely accessible via: https://github.com/xiachashuanghua/GH-UNet.