Enhancing Weakly Supervised Semantic Segmentation through Multi-Class Token Attention Learning

Huilan Luo
Zhen Zeng

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Weakly supervised semantic segmentation (WSSS) using image-level class labels is challenging due to the limitations of Class Activation Maps (CAMs) in convolutional neural networks (CNNs), which often highlight only the most discriminative image regions. We propose the Hierarchical Multi-Class Token Attention Network (HMCTANet), a novel approach leveraging a Conformer backbone that integrates CNN and Transformer branches. HMCTANet enhances CAMs through multi-class token attention and a Class-Aware Training (CAT) strategy that aligns class tokens with ground-truth labels. Additionally, we introduce a Class Token Regularization Module (CTRM) to improve the discriminative power of class tokens. Our Refinement Module (RM) further refines segmentation by combining class-specific attention and patch-level affinity from the Transformer branch with the CAMs from the CNN branch. HMCTANet achieves state-of-the-art performance, with mIoU scores of 69.0% and 68.4% on the PASCAL VOC 2012 validation and test sets, respectively, demonstrating the effectiveness of our approach for WSSS tasks.

Version published to 10.21203/rs.3.rs-4716623/v1 on Research Square
Aug 5, 2024

SCFI-ESeg: Enhancing Semantic Segmentation with Spatial and Content Feature Integration

This article has 5 authors:
1. Ning Li
2. Xudong Zhang
3. Bo Li
4. Baohua Yuan
5. Gaochao Yang
This article has no evaluationsLatest version Oct 29, 2024
SemiTabDETR: End-to-End Semi-Supervised Table Detection with Transformer-based Enhanced Query Approach

This article has 3 authors:
1. Tahira Shhezadi
2. Didier Stricker
3. Muhammad Zeshan Afzal
This article has no evaluationsLatest version Oct 25, 2024
RAVL: A Region Attention Yolo with Two-Stage Training for Enhanced Object Detection

This article has 3 authors:
1. Weiwen Cai
2. Huiqian Du
3. Min Xie
This article has no evaluationsLatest version Nov 4, 2024

Listed in

Abstract

Article activity feed

Related articles

SCFI-ESeg: Enhancing Semantic Segmentation with Spatial and Content Feature Integration

SemiTabDETR: End-to-End Semi-Supervised Table Detection with Transformer-based Enhanced Query Approach

RAVL: A Region Attention Yolo with Two-Stage Training for Enhanced Object Detection