VMamba-Rail: Adaptive State Space Segmentation with Reinforcement Learning for Edge-Deployed Railway Safety Monitoring

Huijin Fu
Zhen Ma
Xue Yang
Wanpeng Zhang
Lei Hu
Ke Jiang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Edge-deployed railway safety monitoring demands pixel-perfect semantic segmentation under extreme constraints: detection failures threaten operational safety, yet trackside devices must process diverse environmental conditions (fog, rain, nighttime) at over 30 FPS with under 50M parameters. Traditional CNNs lack long-range modeling capabilities for distant obstacle detection, while Transformer-based methods impose prohibitive quadratic computational complexity unsuitable for edge deployment. We present HybridSeg, a novel architecture that reformulates visual state space modeling as a controllable Markov Decision Process, enabling context-adaptive information propagation through reinforcement learning. Our approach integrates: (1) meta-learned state space dynamics via Proximal Policy Optimization, where learned policies dynamically adjust state transition parameters based on gradient feedback and hidden state statistics, the first application of RL-based parameter adaptation to visual state space models; (2) Structure-Aware Deformable Mamba blocks combining four-directional scanning with deformable spatial attention for irregular geometry handling; (3) cross-scale attention fusion across four pyramid levels with learnable inter-scale dependency modeling; (4) explicit multi-scale consistency constraints stabilizing training and improving generalization. Evaluated on 8,000 railway surveillance images spanning four environmental conditions, HybridSeg achieves 92.34$\pm$0.25\% mIoU and 97.82$\pm$0.12\% pixel accuracy at 38.52 FPS with 45.28M parameters—outperforming state-of-the-art CNN, Transformer, and Mamba methods by 1.61-3.16\% in accuracy while delivering 2.31$\times$ faster inference than comparable approaches. The architecture demonstrates robust cross-domain transfer (89.53\% CDR) and competitive performance on Cityscapes (85.73\%), CamVid (87.25\%), and ADE20K (48.53\%), validating practical deployment for safety-critical edge applications.

Version published to 10.21203/rs.3.rs-8533153/v1 on Research Square
Jan 13, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed