Shorter FFT Windows Improve Cross-Domain Generalization in CNN-Based Cetacean Whistle Detection: A Controlled Sensitivity Analysis

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In spectrogram-based Convolutional Neural Network (CNN) detectors for Passive Acoustic Monitoring (PAM), the FFT window length directly governs the spectro-temporal representation presented to the classifier, yet its effect on detection performance has received limited systematic treatment. This study presents a controlled sensitivity analysis of FFT window length (256, 512, and 1024 samples) on binary bottlenose dolphin ( Tursiops truncatus ) whistle detection, evaluated through stratified 10-fold cross-validation on an in-domain dataset (192 kHz) and an independent cross-domain benchmark. In-domain performance is uniformly high across all configurations (macro F1-score 0.98; Wilcoxon, all p > 0.05). Cross-domain results diverge substantially: the shortest window is significantly superior ( p = 0.006, rankbiserial r = 0.89). The mechanism is an upsampling amplification effect: coarser spectral bins produce wider, higher-contrast frequency-modulated traces after resampling to fixed image dimensions. This superiority is threshold-invariant: precision equals 1.000 across all tested configurations and decision thresholds. A multiclass extension to five vocalization categories (macro F1-score = 0.843) confirms the framework’s scalability. All experiments were conducted within a six-stage open-source pipeline fully parameterized through a single configuration file, ensuring exact reproducibility. Software source code and both datasets are publicly available.

Article activity feed