Weakly Supervised Semantic Segmentation Based on Subspace- Decoupled Representations and Cross-Layer CAM Structural Alignment

Kaiyang Liao
Junwen Pang
Yuanlin Zheng
Yunfei Tan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Weakly Supervised Semantic Segmentation (WSSS) aims to learn pixel-level semantic predictions using only image-level annotations. However, due to the absence of precise spatial supervision, the generated Class Activation Maps (CAMs) often highlight only the most discriminative regions of objects, resulting in incomplete object coverage and unstable cross-layer semantic responses. To address these challenges, we propose a token-level contrastive learning based framework for WSSS, which improves CAM localization quality by enhancing feature representation and enforcing cross-layer structural consistency. Specifically, we first introduce a multi-subspace token-level contrastive module, which decouples feature representations through a shared semantic backbone and multiple projection subspaces, thereby increasing the diversity and discriminability of the embedding space. Furthermore, we propose a cross-layer CAM structural alignment module that jointly constrains both the response intensity and spatial structural relationships of CAMs across different Transformer layers, leading to more stable semantic localization and improved spatial consistency of object regions. Extensive experiments on the PASCAL VOC 2012 and MS COCO 2014 benchmarks demonstrate that the proposed method consistently improves segmentation performance under an end-to-end training framework. In particular, it achieves 71.8% (val) and 72.3% (test) mIoU on VOC 2012, and 42.6% mIoU on the COCO 2014 validation set. Further ablation studies validate the effectiveness of each component. Overall, our method significantly enhances the completeness and structural stability of CAMs, providing an effective solution for representation learning and structural modeling in WSSS.

Version published to 10.21203/rs.3.rs-9178163/v1 on Research Square
Apr 2, 2026

Label-Graph Guided Semantic Alignment for Multi-Class Remote Sensing Image Recognition

This article has 4 authors:
1. KUN ZHOU
2. LIWEI ZHU
3. YI ZHANG
4. CUNCUN WEI
This article has no evaluationsLatest version Apr 2, 2026
Bridging Scale, Semantics, and Boundaries: A Hybrid CNN-Transformer Architecture with Bidirectional Spatial-Channel Fusion for Medical Image Segmentation

This article has 4 authors:
1. Lanxiang Ma
2. Zongjian Yang
3. Jinghua Zhu
4. Jiquan Ma
This article has no evaluationsLatest version Apr 1, 2026
Qwen-Edit+: Scaling Image Editing with VLM-Guided Consistency and Aesthetic Preference Distillation

This article has 2 authors:
1. Fan Tang
2. Siyuan Li
This article has no evaluationsLatest version Apr 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Label-Graph Guided Semantic Alignment for Multi-Class Remote Sensing Image Recognition

Bridging Scale, Semantics, and Boundaries: A Hybrid CNN-Transformer Architecture with Bidirectional Spatial-Channel Fusion for Medical Image Segmentation

Qwen-Edit+: Scaling Image Editing with VLM-Guided Consistency and Aesthetic Preference Distillation