Optimizing Intermediate Representations: A Framework for Low-Cost, High-Accuracy Behavior Quantification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Quantitative measurement of animal behavior is a cornerstone of neuroscience, genetics, and ethology. While modern computer vision has democratized automated analysis, the field has coalesced around pose estimation as the standard intermediate representation. This reliance imposes a significant bottleneck: researchers must often train custom pose models using large, labor-intensive datasets. Furthermore, the assumption that denser anatomical tracking yields better classification remains largely unverified. Here, we benchmark intermediate representations for supervised mouse behavior classification to determine the optimal trade-off between annotation cost and model performance. We systematically evaluate the sensitivity of classification to keypoint density, the impact of temporal feature engineering, and the viability of segmentation-derived shape descriptors as a low-cost alternative. We find that classifier performance is remarkably robust to keypoint variation; increasing keypoint density yields negligible gains, particularly when behavior training sets are sufficiently large. In contrast, augmenting models with temporal features (specifically FFT-based signal processing) consistently drives performance improvements. Crucially, we demonstrate that whole-body segmentation achieves performance parity with explicit pose estimation across most behaviors. These findings challenge the "more is better" intuition in pose tracking and suggest a paradigm shift: efficient pipelines should prioritize behavioral dataset volume and temporal dynamics over complex anatomical keypoints.

Article activity feed