Attention-Weighted Hierarchical Decoding for Few-Shot Semantic Segmentation: A Case Study on Batik Cultural Heritage Patterns
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Few-shot semantic segmentation aims to learn accurate pixel-level classification from limited annotated samples, a critical capability for real-world applications where data acquisition is expensive or impractical. However, existing methods often struggle with fine-grained texture details and complex boundaries under data-scarce conditions, particularly when applied to domains with intricate visual patterns (such as batik patterns). To address this few-shot learning challenge, we constructed a few-shot batik pattern dataset and proposed a novel network architecture centered on attention weighting and hierarchical decoding. Our method leverages a pre-trained ResNet101 backbone for transfer learning to establish a strong feature foundation. It incorporates a dual-attention module that combines spatial and channel attention to dynamically highlight semantically rich regions and intricate texture boundaries specific to batik. For multi-scale context aggregation, a lightweight module utilizing parallel dilated convolutions is introduced to efficiently capture features from varying receptive fields. Finally, a hierarchical decoder progressively integrates these enhanced, multi-scale features with high-resolution shallow features to reconstruct precise segmentation maps. Comprehensive evaluations on a dedicated batik dataset show that our model achieves state-of-the-art performance, with a mean Intersection over Union (mIoU) of 79.22% and a pixel accuracy (PA) of 92.47%. It notably improves over the strong DeepLabV3+ baseline by 3.3% in mIoU and 0.95% in PA, demonstrating its effectiveness for the task of batik pattern segmentation under data-scarce conditions.