DualStack: Multi-Resolution Spectrogram Fusion Improves Bird Sound Classification for Ecological Monitoring

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Automated bird sound classification plays a critical role in biodiversity assessment, ecological monitoring, and conservation research. Many current approaches use single-resolution spectrograms, which fail to fully capture the multi-scale acoustic features of avian vocalizations. We present DualStack, a new method that vertically stacks high-resolution and low-resolution Mel spectrograms into a single image, allowing convolutional neural networks to jointly learn fine temporal and broad spectral patterns. Using a dataset of 967 recordings from 22 species sourced from Xeno-Canto, DualStack achieved 86.63% classification accuracy, outperforming both single-resolution baselines and a BiParallel ResNet18 multi-branch architecture. This method improves species identification accuracy while remaining applicable to real-time monitoring, supporting more effective conservation efforts and large-scale ecological studies.

Article activity feed