Core dimensions of the perceptually relevant voice identity space

Claudia Roswandowitz
Florian P. Mahner
Martin N Hebart
Volker Dellwo

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Human listeners are remarkably skilled at identifying voices, yet the mental representations that support voice identity processing remain poorly understood. In this study, we present a data-driven approach that reveals core dimensions of the voice identity space. We compiled a large-scale voice corpus including 337 male speakers producing each a unique sentence (VocDat). Using a Sparse Positive Similarity Embedding (SPoSE) model trained on voice similarity judgments from 389 participants, we identified a five-dimensional voice identity space capturing human between-speaker similarity judgments. Regression analyses showed that these model dimensions were characterised by distinct combinations of acoustic voice properties, with pitch, spectral slope, and syllable rate emerging as the most consistent and informative features. These acoustic labels closely aligned with perceptual voice attributes provided by trained phoneticians (e.g. low pitch, old age, fast speech), demonstrating that these dimensions can be interpreted in a perceptually meaningful way. By combining a data-driven modelling approach with a large and acoustically diverse speaker sample, and by characterising voice identity space dimensions in both acoustic and perceptual terms, our findings provide perceptually grounded insights into the structure of voice identity representations.

Version published to 10.31234/osf.io/3wfds_v1 on OSF Preprints
Feb 16, 2026

Voice perception across languages: Reduced similarity between speakers, stable identity within speakers

This article has 1 author:
1. Tianze Xu
This article has no evaluationsLatest version Feb 21, 2026
Reverse-Engineering Speech and Music Categorization from a Single Sound Source

This article has 5 authors:
1. Lauren K Fink
2. Madita Hörster
3. David Poeppel
4. Melanie Wald-Fuhrmann
5. Pauline Larrouy-Maestri
This article has no evaluationsLatest version Jan 25, 2026
Towards an Attentional Theory of Second-Language Speech Perception: Evidence from Cue Ecology

This article has 6 authors:
1. Annie C. Tremblay
2. Taehong Cho
3. Mirjam Broersma
4. Hyoju Kim
5. Quentin Zhen Qin
6. Jiayu Liang
This article has no evaluationsLatest version Feb 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Voice perception across languages: Reduced similarity between speakers, stable identity within speakers

Reverse-Engineering Speech and Music Categorization from a Single Sound Source

Towards an Attentional Theory of Second-Language Speech Perception: Evidence from Cue Ecology