HybridSeg: An Efficient Multi-Scale Mamba Architecture for Real-Time Semantic Segmentation in Railway Safety Monitoring Systems
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Real-time semantic segmentation in railway safety monitoring presents significant computational challenges for edge-deployed vision systems, where accurate object detection directly impacts operational safety under strict resource constraints. Existing approaches struggle to achieve optimal trade-offs between segmentation accuracy and computational efficiency while maintaining robustness across varying environmental conditions. To address these challenges, we propose HybridSeg, an efficient multi-scale vision architecture that integrates Mamba-based sequence modeling with hierarchical feature fusion for resource-constrained railway monitoring applications. Our framework incorporates four key technical innovations: (1) Structure-Aware Preprocessing (SAP) that enhances input features through multi-scale structural analysis, (2) Structure-Aware Deformable Mamba (SADM) blocks enabling efficient long-range dependency capture via multi-directional scanning and deformable spatial attention, (3) Multi-Scale Feature Fusion (MSFF) with cross-scale attention for hierarchical feature integration, and (4) Cross-Scale Consistency (CSC) training that enforces multi-scale feature alignment. The decoder employs gated skip connections for adaptive feature combination across resolution levels. A comprehensive evaluation of railway surveillance datasets demonstrates superior performance, achieving 84.8±0.004% mIoU and 93.8±0.003% pixel accuracy while maintaining real-time efficiency at 31.8 FPS with only 38.9M parameters. The architecture exhibits robust performance across diverse environmental conditions, showing merely 4.1 percentage points of degradation under challenging nighttime scenarios. Our method effectively segments three critical object classes — personnel, foreign objects, and railway infrastructure — delivering substantial improvements in safety-critical detection tasks while enabling practical deployment on resource-limited edge computing platforms.