Validation of the Transformer-Based Monocular System (Capture4D): A Real-time Kinematic Analysis in Coaching/Teaching Tennis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Human motion capture is crucial for various fields, but traditional optical systems (OMC) are costly and restrictive. Monocular video-based methods offer accessibility, yet face accuracy challenges, especially in dynamic sports like tennis. This study validates Capture4D, a novel Transformer-based monocular system, for capturing a wide range of tennis strokes. We developed a universal biomechanical analysis framework (K0-K5) applicable to twelve fundamental stroke types. To demonstrate the system’s capabilities, this paper focuses on a detailed validation using the tennis serve as a representative example. We conducted experiments with 9 high-level tennis players, and motion data were simultaneously captured using Capture4D (single RGB camera) and OMC Qualisys (gold standard). Accuracy was evaluated by comparing 3D joint coordinates and joint angles using Normalized Mean Per Joint Position Error (NMPJPE), RMSE and MAE. The results showed that Capture4D effectively captured the tennis player’s motion, with average NMPJPE for tennis serves ranging from 69.5mm to 88.3mm, within the acceptable range (70-130mm) for coaching purpose. Compared to OMC, Capture4D demonstrated comparable joint angle trajectories, with advantages in operational convenience, cost-effectiveness, and wider applicability. It offered approximately 50% reduction in setup time and 80% cost savings. Capture4D presents a valid and practical monocular motion capture solution for coaching tennis and other broader applications in sports. While slightly less precise than OMC, its accuracy is acceptable for many use cases in coaching and teaching. It offers significant advantages in convenience and cost, paving the way for accessible motion analysis in diverse environments like outdoor settings and multi-person scenarios, in which OMC is not possible to be used. This technology holds promise for democratizing motion capture in sports training and coaching/teaching.