Voice perception across languages: Reduced similarity between speakers, stable identity within speakers

Tianze Xu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose: Despite the central role of voice perception in social interaction, the mechanisms underlying cross-linguistic voice similarity judgments remain underexplored. This study examined how voice similarity is perceived within and across languages, using bilingual speech as a means to isolate language-driven effects while controlling for anatomical variation.Method: English-speaking listeners with and without Cantonese proficiency completed a perceptual similarity rating task involving Cantonese–English bilingual speakers. Listeners rated the similarity of voice pairs on a nine-point scale, including both within-language and cross-language comparisons. Multidimensional scaling (MDS) was applied to the dissimilarity ratings to model perceptual voice space, and acoustic–perceptual correlations were used to relate perceptual dimensions to acoustic properties of the speech signal.Results: Cross-language voice pairs were perceived as significantly more dissimilar than within-language pairs, particularly for same-speaker items, indicating a language mismatch effect. No robust language familiarity effect was observed between listener groups. Importantly, same-speaker pairs were consistently judged as more similar than different-speaker pairs even across languages—a finding further supported by MDS, which revealed a stable underlying perceptual structure in which speakers generally clustered with themselves across languages. Perceptual dimensions were primarily associated with pitch, vocal tract size, and harmonic-to-noise spectral characteristics, with their relative weighting modulated by language context.Conclusions: The findings indicate that while language mismatch reduces perceived similarity between speakers, the vocal identity of a speaker remains perceptually stable across languages. These results refine the prototype model of voice perception by demonstrating dynamic, context-sensitive weighting of acoustic cues in bilingual voice processing.

Version published to 10.31234/osf.io/w3jfc_v1 on OSF Preprints
Feb 21, 2026

Core dimensions of the perceptually relevant voice identity space

This article has 4 authors:
1. Claudia Roswandowitz
2. Florian P. Mahner
3. Martin N Hebart
4. Volker Dellwo
This article has no evaluationsLatest version Feb 16, 2026
Towards an Attentional Theory of Second-Language Speech Perception: Evidence from Cue Ecology

This article has 6 authors:
1. Annie C. Tremblay
2. Taehong Cho
3. Mirjam Broersma
4. Hyoju Kim
5. Quentin Zhen Qin
6. Jiayu Liang
This article has no evaluationsLatest version Feb 6, 2026
Speech Categorization Consistency Predicts General Language Abilities

This article has 3 authors:
1. Hyoju Kim
2. Bruce Tomblin
3. Bob McMurray
This article has no evaluationsLatest version Feb 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Core dimensions of the perceptually relevant voice identity space

Towards an Attentional Theory of Second-Language Speech Perception: Evidence from Cue Ecology

Speech Categorization Consistency Predicts General Language Abilities