Lightning Pose 3D: an uncertainty-aware framework for data-efficient multi-view animal pose estimation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Multi-view pose estimation is essential for quantifying animal behavior in scientific research, yet current methods struggle to achieve accurate tracking with limited labeled data and suffer from poor uncertainty estimates. We address these challenges with a flexible framework that can operate with or without camera calibration, combining novel training and post-processing techniques with an uncertainty-aware pseudo-labeling distillation procedure. Our multi-view model processes all camera views jointly using a pretrained vision transformer backbone, and a simulated occlusion technique encourages the model to learn robust cross-view correspondences without requiring camera parameters. When camera parameters are available, 3D data augmentations and a triangulation-based loss further encourage geometric consistency. We extend the Ensemble Kalman Smoother (EKS) post-processor to the nonlinear case, leveraging camera geometry, and introduce a variance inflation technique that detects cross-view inconsistencies and corrects overconfident predictions. We validate our approach on five datasets spanning three species (fly, mouse, bird), including a multi-animal dataset with two visually distinct individuals; the proposed pipeline consistently outperforms existing methods across datasets. We demonstrate how these improvements translate to downstream scientific analyses using data from the International Brain Laboratory, showing improved unsupervised behavioral clustering and neural decoding of paw kinematics with just 200 labeled frames. To facilitate adoption, we developed a browser-based, cloud-compatible user interface that supports the full life cycle of multi-view pose estimation, from labeling and model training to post-processing with EKS and diagnostic visualizations.