FERAL: A Video-Understanding System for Direct Video-to-Behavior Mapping
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Animal behavior unfolds continuously in time, yet quantitative analyses often require segmenting it into discrete, interpretable states. Although manual annotation can achieve this, it remains slow, subjective, and difficult to scale. Most automated pipelines use tracked body parts to infer actions, but are limited by tracking quality, and discard much of the visual information contained in raw videos. Here we present FERAL (Feature Extraction for Recognition of Animal Locomotion), a supervised video-understanding toolkit that bridges this gap by mapping raw video directly to frame-level behavioral labels, bypassing the need for pose estimation. Across benchmarks, FERAL outperforms state-of-the-art pose- and video-based baselines: on a benchmarking dataset of mouse social interaction, it surpasses Google’s Videoprism using just a quarter of the training data. FERAL generalizes across species, recording conditions, and levels of behavioral organization: from single-animal locomotion to complex social interactions and emergent collective dynamics. Released as a user-friendly, open-source package, FERAL overcomes the challenges of traditional approaches, integrates easily with existing analysis pipelines, and can be deployed locally or on cloud servers with a few clicks. By mapping raw video directly to annotated behavior, FERAL lowers the barrier to scalable, cross-species behavioral quantification and broadens the range of behavioral analyses possible in both the lab and the wild.