VMamba-Rail: Adaptive State Space Segmentation with Reinforcement Learning for Edge-Deployed Railway Safety Monitoring

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Edge-deployed railway safety monitoring demands pixel-perfect semantic segmentation under extreme constraints: detection failures threaten operational safety, yet trackside devices must process diverse environmental conditions (fog, rain, nighttime) at over 30 FPS with under 50M parameters. Traditional CNNs lack long-range modeling capabilities for distant obstacle detection, while Transformer-based methods impose prohibitive quadratic computational complexity unsuitable for edge deployment. We present HybridSeg, a novel architecture that reformulates visual state space modeling as a controllable Markov Decision Process, enabling context-adaptive information propagation through reinforcement learning. Our approach integrates: (1) meta-learned state space dynamics via Proximal Policy Optimization, where learned policies dynamically adjust state transition parameters based on gradient feedback and hidden state statistics, the first application of RL-based parameter adaptation to visual state space models; (2) Structure-Aware Deformable Mamba blocks combining four-directional scanning with deformable spatial attention for irregular geometry handling; (3) cross-scale attention fusion across four pyramid levels with learnable inter-scale dependency modeling; (4) explicit multi-scale consistency constraints stabilizing training and improving generalization. Evaluated on 8,000 railway surveillance images spanning four environmental conditions, HybridSeg achieves 92.34$\pm$0.25\% mIoU and 97.82$\pm$0.12\% pixel accuracy at 38.52 FPS with 45.28M parameters—outperforming state-of-the-art CNN, Transformer, and Mamba methods by 1.61-3.16\% in accuracy while delivering 2.31$\times$ faster inference than comparable approaches. The architecture demonstrates robust cross-domain transfer (89.53\% CDR) and competitive performance on Cityscapes (85.73\%), CamVid (87.25\%), and ADE20K (48.53\%), validating practical deployment for safety-critical edge applications.

Article activity feed