Enhancing Soundscape Characterization and Pattern Analysis Using Low-Dimensional Deep Embeddings on a Large-Scale Dataset
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Soundscape monitoring has become an increasingly important tool for studying ecological processes and supporting habitat conservation. While many recent advances focus on identifying species through supervised learning, there is growing interest in understanding the soundscape as a whole while considering patterns that extend beyond individual vocalizations. This broader view requires unsupervised approaches capable of capturing meaningful structures related to temporal dynamics, frequency content, spatial distribution, and ecological variability. In this study, we present a fully unsupervised framework for analyzing large-scale soundscape data using deep learning. We applied a convolutional autoencoder (Soundscape-Net) to extract acoustic representations from over 60,000 recordings collected across a grid-based sampling design in the Rey Zamuro Reserve in Colombia. These features were initially compared with other audio characterization methods, showing superior performance in multiclass classification, with accuracies of 0.85 for habitat cover identification and 0.89 for time-of-day classification across 13 days. For the unsupervised study, optimized dimensionality reduction methods (Uniform Manifold Approximation and Projection and Pairwise Controlled Manifold Approximation and Projection) were applied to project the learned features, achieving trustworthiness scores above 0.96. Subsequently, clustering was performed using KMeans and Density-Based Spatial Clustering of Applications with Noise (DBSCAN), with evaluations based on metrics such as the silhouette, where scores above 0.45 were obtained, thus supporting the robustness of the discovered latent acoustic structures. To interpret and validate the resulting clusters, we combined multiple strategies: spatial mapping through interpolation, analysis of acoustic index variance to understand the cluster structure, and graph-based connectivity analysis to identify ecological relationships between the recording sites. Our results demonstrate that this approach can uncover both local and broad-scale patterns in the soundscape, providing a flexible and interpretable pathway for unsupervised ecological monitoring.