HCTNet: Hybrid CNN--Mamba Network for Real-Time Semantic Segmentation in Urban Traffic Scenes

Qiang Meng
Jingjun Cheng
Wenbang Hao
Mengyi Liu
Xiang Gao
Zhiyuan Zhao

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Real-time semantic segmentation in urban traffic scenes must preserve fine structures while capturing long-range context under strict latency constraints. We propose HCTNet, a hybrid CNN--Mamba framework that performs single-branch CNN inference and leverages a training-only Mamba auxiliary branch to inject global context during optimization. The method introduces a lightweight Convolutional State Module (CSM) to enlarge the effective receptive field within the CNN backbone and a Feature Alignment Module (FAM) to align multi-scale representations from the CNN and Mamba streams via spatial/channel projections and gated fusion. A single shared decoder is used for all streams during training to enforce a common prediction space; at test time only the CNN path with the shared decoder is executed to retain real-time efficiency. On Cityscapes, HCTNet attains 81.0\% mean Intersection-over-Union (mIoU) at 60.5 frames per second (FPS) and reaches up to 108.9 FPS with an optimized inference setting; under a reduced input scale it achieves 80.3\% mIoU at 98.6 FPS. On ApolloScape (mapped to the Cityscapes-19 taxonomy), HCTNet obtains 73.8\% mIoU. Qualitative results show sharper boundaries and more coherent predictions for small and distant objects. Ablation studies indicate that receptive-field enhancement from CSM, training-time global guidance from the Mamba branch, and multi-scale alignment through FAM jointly account for the gains, while the shared decoder regularizes predictions without increasing inference cost.

Version published to 10.21203/rs.3.rs-7588308/v1 on Research Square
Oct 1, 2025

SCM: Semantic Segmentation with Dual-Stream Semantic Synergy under Adverse Weather Conditions

This article has 5 authors:
1. Shuochen Tian
2. Jian Pang
3. Jin Wang
4. Bingfeng Zhang
5. Weifeng Liu
This article has no evaluationsLatest version Sep 3, 2025
MASA-RTNet: A Multimodal Adaptive-Stream-Attention Network for Real-Time Video Suspicious-Behaviour Detection

This article has 2 authors:
1. Lucky Rajpoot
2. Rosy Madaan
This article has no evaluationsLatest version Aug 27, 2025
A Multi-Scale Feature Fusion Dual-Branch Mamba-CNN Network for Landslide Extraction

This article has 3 authors:
1. Zhiheng Yang
2. Hua Zhang
3. Nanshan Zheng
This article has no evaluationsLatest version Sep 2, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

SCM: Semantic Segmentation with Dual-Stream Semantic Synergy under Adverse Weather Conditions

MASA-RTNet: A Multimodal Adaptive-Stream-Attention Network for Real-Time Video Suspicious-Behaviour Detection

A Multi-Scale Feature Fusion Dual-Branch Mamba-CNN Network for Landslide Extraction