A Spatiotemporal Bidirectional Mamba Network with Global–Local Skeletal Enhancement for 3D Human Pose Estimation

Chuhan Wu
Zan Wang
Guixian Zhou
Jiahao Hua
Lianke Shi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

3D human pose estimation (HPE) is a cornerstone task in computer vision with diverse applications, where lifting 2D pose sequences to 3D representations has attracted significant interest. Transformer-based approaches have demonstrated robust performance but are hampered by quadratic computational complexity and insufficient bidirectional modeling capabilities. The recently introduced Mamba model mitigates these limitations through state-space models (SSMs) offering linear complexity and effective long-range dependencies; however, it falls short in modeling local skeletal interactions essential for human motion.To address this, we present BSTMamba, a bidirectional spatiotemporal SSM architecture designed specifically for monocular 3D HPE. BSTMamba integrates efficient global sequence modeling with localized convolutions and dynamic gating mechanisms to capture intricate spatiotemporal dependencies. For enhanced robustness and generalization, we introduce DisruptEnhance, a residual-compensated joint-order perturbation module that randomly disrupts joint orders at both global (full-skeleton) and local (body-part) scales, followed by feature compensation via a lightweight residual subnet. Comprehensive evaluations on the Human3.6M and MPI-INF-3DHP datasets reveal that BSTMamba attains state-of-the-art accuracy while requiring fewer parameters and lower multiply-accumulate operations (MACs) compared to prior methods.

Version published to 10.21203/rs.3.rs-7477209/v1 on Research Square
Sep 4, 2025

Diving Performance Analysis with 3D Motion Knowledge Hypergraphs

This article has 4 authors:
1. Jingbo Wang
2. Yifan Xie
3. Yitao Xie
4. Hongyu Xiao
This article has no evaluationsLatest version Sep 8, 2025
UREPTrack: Unified RGB-Event Visual Tracking via PoolFormer Backbone

This article has 1 author:
1. Min Lu
This article has no evaluationsLatest version Sep 24, 2025
DMLNet:Densely Connected and Multi-Scale Lightweight High-Resolution Human Pose Estimation Network

This article has 5 authors:
1. Chunsheng Zhang
2. Wanggen Li
3. Cheng Wang
4. Yuchen Li
5. Shangshu Gao
This article has no evaluationsLatest version Oct 25, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Diving Performance Analysis with 3D Motion Knowledge Hypergraphs

UREPTrack: Unified RGB-Event Visual Tracking via PoolFormer Backbone

DMLNet:Densely Connected and Multi-Scale Lightweight High-Resolution Human Pose Estimation Network