A global assessment of BirdNET performance: differences among continents, biomes, and species

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Recent advances in machine learning have accelerated automated species detection across diverse ecological domains, enabling large-scale, non-invasive monitoring of biodiversity. In ornithological research, coupling passive acoustic monitoring (PAM) with rapidly-developing novel identification tools such as BirdNET—a deep learning–based sound recognition algorithm—offers new opportunities for surveying vocally active bird communities. Yet, BirdNET performance across diverse ecological and biogeographic contexts remains to be quantified. Here, we present the first worldwide evaluation of BirdNET using 4,224 one-minute soundscapes from 67 sites across 28 administrative regions annotated by local experts that included 1,020 species. More specifically, we assessed the capacity of BirdNET to correctly identify individual vocalisations and characterise bird communities based on the automated analysis of passively collected soundscapes. We further analysed how its performance varies across continents, biomes, species, and minimum confidence thresholds. The proportion of correct BirdNET predictions (precision) was generally high and consistent across continents (range: 0.57–0.71 at the vocalisation level) and biomes (range: 0.55–0.76 at the vocalisation level). In contrast, the proportion of vocalisations or species successfully detected (recall) was generally lower and more heterogeneous across continents (range: 0.24–0.52 at the vocalisation level) and biomes (range: 0.34–0.72 at the vocalisation level), reflecting differences in species coverage and local ecological context. BirdNET predictive power, as measured by the Precision-Recall Area Under the Curve (PR AUC), was highest in North America, Oceania, and Europe (range: 0.16–0.23 at the vocalisation level), moderate in Central/South America (0.13), and lowest in Africa and Asia (range: 0.03–0.04). Species-specific analyses revealed substantial heterogeneity in detection accuracy, with optimal confidence thresholds varying widely by species and analytical goal. Our results establish a global reference point for BirdNET reliability and highlight where algorithmic refinement and expanded acoustic sampling are most needed.

Article activity feed