Shared acoustic manifolds for exploratory comparison of passerine vocalizations
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study presents a fixed-parameter pipeline designed to support reproducible embedding of frame-level representations of multiple passerine vocalizations within shared low-dimensional spaces. Three passerine species are considered: Eurasian Wren, Tree Pipit and Common Chaffinch, with a selection of four individuals for each species group. Vocalization frames from each species group are mapped into a single three-dimensional coordinate system to allow comparison between individuals while preserving temporal continuity. The pipeline operates under a controlled protocol with an unsupervised, geometry-first exploratory approach. Two feature representations are used: MFCC (40 coefficients with delta and delta-delta) and 80-bin chroma vectors. The two feature sets provide complementary analytical lenses on the signal, ranging from spectral-envelope dynamics to relative frequency organization, without imposing discrete musical categories. The dimensionality-reduction process features a PCA-20 preconditioning step followed by a UMAP embedding, resulting in a total of six manifolds (two feature spaces x three species). The resulting embeddings are visualized as continuous trajectories in two separate layouts: a view with individual identity separated by solid coloring and another augmented view with descriptor overlays as color coding, applied post-embedding. The descriptors include spectral centroid and a chroma-derived concentration measure (Chroma Energy Concentration or CEC, introduced in this work), visualized as scalar fields on the manifold geometry. A supplementary case study demonstrates event-level backtracking from localized manifold regions to the underlying audio, enabling identification of recurring vocal events concentrated in specific embedding regions. The framework operates independently of labeling or categorization: it provides a descriptive interface intended to complement spectrogram-based analysis, supporting qualitative comparison and hypothesis generation.