Stratified Active Learning for Spatiotemporal Generalisation in Large-Scale Bioacoustic Monitoring
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Active learning optimises machine learning model training through the data-efficient selection of informative samples for annotation and training. In the context of biodiversity monitoring using passive acoustic monitoring, active learning offers a promising strategy to reduce the fundamental data and annotation bottleneck and improve global training efficiency. However, the generalisability of model performance across ecologically relevant strata (e.g. sites, season etc) is often overlooked. As passive acoustic monitoring is extended to larger scales and finer resolutions, inter-strata spatiotemporal variability also increases. We introduce and investigate the concept of stratified active learning to achieve reliable and generalisable model performance across deployment conditions. We compare between implicit cluster-based diversification methods and explicit stratification, demonstrating that cross-strata generalisation is a function of stratum divergence, not sampling balance. Additionally, mutual information as well as exclusion analysis show that spatiotemporal context can explain a substantial proportion of species label variance and inform sampling decisions.