Data-driven Sampling Strategies for Fine-Tuning Bird Detection Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Passive Acoustic Monitoring has emerged as a promising tool for collecting ecological data, particularly in the context of bird population monitoring. Bird species can be automatically identified using pre-trained models, such as BirdNET. The performance of these models can be significantly improved through fine-tuning with annotated samples recorded in the specific acoustic conditions in which the microphones are deployed. However, PAM collects vast amounts of data, and annotating bird vocalizations requires specialized expetise. As a result, only a very small portion of the recordings can be effectively labeled. Selecting the most relevant samples to annotate in order to maximize performance in model fine-tuning remains a significant challenge. First, a regularization technique addresses the challenge of class imbalance during model fine-tuning. Next, a data-driven methodology is developed, introducing the influence score , which quantifies the impact of individual training samples on model performance to inform sampling strategies. A linear model is proposed to estimate the influence score for generalization to unseen data. Finally, several sampling strategies are compared, based on acoustic indices and predictions of the pre-trained model. Together, these contributions enable the identification of efficient annotation strategies to overcome the challenges of limited annotation resources in large-scale passive acoustic monitoring.

Article activity feed