Higher Performance Full-Body Tracking Method by Integrating Multiple Tracking Techniques Based on Deep Latent Space

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In this paper, we propose the Deep Latent Space Assimilation Model (D-LSAM), a novel framework for integrating multiple body tracking techniques in XR environments to achieve more precise, real-time motion capture. Inside-Out Body Tracking (IOBT) on VR headsets can accurately track upper-body and finger movements, yet it struggles to capture areas outside the camera’s field of view—particularly the lower body. On the other hand, external-camera or smartphone-based systems can observe the entire body but often suffer from delays or reduced accuracy. The D-LSAM addresses these limitations by combining a Wasserstein autoencoder for pose compression, a Transformer-driven Latent Time-Stepping module for movement prediction, and a cross-attention gating mechanism that adaptively fuses data from various sources. Experimental results confirm that the D-LSAM outperforms both the extended Kalman filter and particle filter-based methods in short- to mid-term motion forecasting. Future work will emphasize faster inference, improved handling of rapid movements, and support for a wider range of devices. Progress in this methodology holds promise for delivering more immersive XR applications and for advancing fields such as medicine, sports, and rehabilitation.

Article activity feed