Higher Performance Full-Body Tracking Method by Integrating Multiple Tracking Techniques Based on Deep Latent Space

Kazuhiro Esaki
Katashi Nagao

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In this paper, we propose the Deep Latent Space Assimilation Model (D-LSAM), a novel framework for integrating multiple body tracking techniques in XR environments to achieve more precise, real-time motion capture. Inside-Out Body Tracking (IOBT) on VR headsets can accurately track upper-body and finger movements, yet it struggles to capture areas outside the camera’s field of view—particularly the lower body. On the other hand, external-camera or smartphone-based systems can observe the entire body but often suffer from delays or reduced accuracy. The D-LSAM addresses these limitations by combining a Wasserstein autoencoder for pose compression, a Transformer-driven Latent Time-Stepping module for movement prediction, and a cross-attention gating mechanism that adaptively fuses data from various sources. Experimental results confirm that the D-LSAM outperforms both the extended Kalman filter and particle filter-based methods in short- to mid-term motion forecasting. Future work will emphasize faster inference, improved handling of rapid movements, and support for a wider range of devices. Progress in this methodology holds promise for delivering more immersive XR applications and for advancing fields such as medicine, sports, and rehabilitation.

Version published to 10.21203/rs.3.rs-8226996/v1 on Research Square
Dec 4, 2025

Vision-Based Human Pose Estimation for Intelligent Sports Training and Teaching Assistance

This article has 1 author:
1. Chen Lu
This article has no evaluationsLatest version Jan 22, 2026
Human Activity Recognition in the Deep Learning Era: Different Modalities, Recent Advances in Applications, and Emerging Techniques

This article has 2 authors:
1. Mohammad Osman Khan
2. Imran Khan Apu
This article has no evaluationsLatest version Dec 10, 2025
CDGaitFusion: a multimodal gait recognition network based on the fusion of commonality patterns and differential features

This article has 4 authors:
1. Siwei Wei
2. Qi Shi
3. Feifei Wei
4. Chunzhi Wang
This article has no evaluationsLatest version Dec 17, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Vision-Based Human Pose Estimation for Intelligent Sports Training and Teaching Assistance

Human Activity Recognition in the Deep Learning Era: Different Modalities, Recent Advances in Applications, and Emerging Techniques

CDGaitFusion: a multimodal gait recognition network based on the fusion of commonality patterns and differential features