Perception of Humanness Is Affected by Speech Content
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The increasing use of computer-generated speech in various applications has raised questions about how people perceive synthetic voices. This study investigates the role of linguistic information in the perception of humanness in speech. We conducted two experiments with native German-, Spanish- and Turkish-speaking participants who rated the human-likeness of human and text-to-speech (TTS)-generated voices. By presenting German sentences as well as manipulated versions of those sentences in terms of syntax and semantics, we examined the role of three types of linguistic information, that is, phonetics, semantics, and syntax, on humanness perception. Acoustic analyses revealed differences between human and TTS-generated voices in terms of summary acoustics and dynamic contours of pitch and intensity, thus showing that TTS-generated voices are not yet fully aligned with human voices on voice quality and prosody. Importantly, behavioral results showed that these acoustic differences were more salient to native German listeners, who distinguished between human and synthetic voices more extremely. In addition to the role of phonetic or phonological familiarity, we observed a role of both syntax and semantics in humanness perception, with the manipulated sentences sounding less human regardless of the speaker (i.e., TTS-generated or human), but only for the native speakers. Lastly, humanness perception of speech appears to be relatively idiosyncratic as indicated by the individual differences observed. Altogether, this study contributes to our understanding of the interplay between linguistic and paralinguistic information in speech perception, and clarifies how listeners perceive their increasingly synthetically-generated soundscape.