DualStack: Multi-Resolution Spectrogram Fusion Improves Bird Sound Classification for Ecological Monitoring
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Automated bird sound classification plays a critical role in biodiversity assessment, ecological monitoring, and conservation research. Many current approaches use single-resolution spectrograms, which fail to fully capture the multi-scale acoustic features of avian vocalizations. We present DualStack, a new method that vertically stacks high-resolution and low-resolution Mel spectrograms into a single image, allowing convolutional neural networks to jointly learn fine temporal and broad spectral patterns. Using a dataset of 967 recordings from 22 species sourced from Xeno-Canto, DualStack achieved 86.63% classification accuracy, outperforming both single-resolution baselines and a BiParallel ResNet18 multi-branch architecture. This method improves species identification accuracy while remaining applicable to real-time monitoring, supporting more effective conservation efforts and large-scale ecological studies.