Shorter FFT Windows Improve Cross-Domain Generalization in CNN-Based Cetacean Whistle Detection: A Controlled Sensitivity Analysis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In spectrogram-based Convolutional Neural Network (CNN) detectors for Passive Acoustic Monitoring (PAM), the FFT window length directly governs the spectro-temporal representation presented to the classifier, yet its effect on detection performance has received limited systematic treatment. This study presents a controlled sensitivity analysis of FFT window length (256, 512, and 1024 samples) on binary bottlenose dolphin ( Tursiops truncatus ) whistle detection, evaluated through stratified 10-fold cross-validation on an in-domain dataset (192 kHz) and an independent cross-domain benchmark. In-domain performance is uniformly high across all configurations (macro F1-score ≈ 0.98; Wilcoxon, all p > 0.05). Cross-domain results diverge substantially: the shortest window is significantly superior ( p = 0.006, rankbiserial r = 0.89). The mechanism is an upsampling amplification effect: coarser spectral bins produce wider, higher-contrast frequency-modulated traces after resampling to fixed image dimensions. This superiority is threshold-invariant: precision equals 1.000 across all tested configurations and decision thresholds. A multiclass extension to five vocalization categories (macro F1-score = 0.843) confirms the framework’s scalability. All experiments were conducted within a six-stage open-source pipeline fully parameterized through a single configuration file, ensuring exact reproducibility. Software source code and both datasets are publicly available.