BirdAVES in the wild: individual recognition as a step toward zebra finch communication networks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Understanding who communicates with whom, when, and how is central to the ecology of group-living animals, yet individual-level acoustic identification of animals in their natural environment remains challenging. Zebra finches are a model species whose vocal behaviour has been predominantly studied indoors; here we address the outdoor setting and investigate bioacoustic deep learning for individual identification at scale as a key step to build communication networks from field recordings. We fine-tune BirdAVES for recognizing 173 zebra finch individuals from short (1-3 s) clips using a concise training recipe: two-phase training, weighted sampling and class-weighted cross-entropy for long-tailed counts, and a supervised contrastive term to pull same-individual embeddings together. On a real-world dataset (2,915 clips, 173 individuals), the selected model achieved macro-F1 = 0.733 (val) / 0.726 (test) and steep retrieval gains (Top-5 = 0.868, Top-10 = 0.893 on test set). This enables conversion of hours of audio into “who-sang-when” timelines. We deliberately report top-k performance because it quantifies review effort and supports human-in-the-loop workflows by shrinking the number of clips an expert must audit. While a train–val/test gap reflects short windows and class imbalance, the embeddings are discriminative and immediately useful. Key next steps are to address the imbalance in our data and scaling towards a significantly larger set of individuals, and to translate individual recognition into communication or social networks.