Improved detection of bird vocalisations using BirdNET embeddings and machine learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Automated bird sound recognition has become an essential tool for biodiversity monitoring, enabling large-scale species detection from audio recordings. BirdNET is a well-known deep learning algorithm that has been trained using a large dataset of community labeled recordings and demonstrated strong performance in identifying bird species. When applied on a certain case such as a specific species or a geographical location, its performance can be leveraged through fine-tuning or incorporating a posterior classification step. In this study, the detection of the Eurasian Woodcock (Scolopax rusticola) calls is investigated. BirdNET embeddings are used as feature representations and classifiers are trained based on these features. A strongly labeled dataset is created manually by annotating 97 recent recordings (2023–2024) from Xeno-canto, extracting 501 positive segments and 2,505 negative segments. We also make use of a second dataset available from the literature. BirdNET was then evaluated on both of these datasets, achieving an average precision of 84.3% and 89.3%, respectively. To enhance the detection accuracy, three machine learning classifiers are trained, i.e. Support Vector Machine (SVM), Random Forest, and XGBoost. The results indicate a significant improvement in classification performance, with overall average precision scores reaching the values of 99.8%–100% for both cases, in comparison to the baseline performance. Hence, the present work demonstrates that a hybrid (two-stage) deep learning approach, where the embeddings from a bird audio model are leveraged with posterior classifiers and strongly labeled data, can be a very accurate method for the recognition of bird species.