Making an impression: Participant-led voice synthesis reveals the acoustic signatures of first impressions
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Listeners rapidly form first impressions from voices, inferring multiple person characteristics within milliseconds. We employed a novel method, Self-Steered Sound Synthesis (S4), to identify and compare the acoustic signatures underlying these impressions. Participants interactively used S4 to synthesise voices expressing six person characteristics - age, masculinity, health, attractiveness, dominance, and trustworthiness - by manipulating four perceptually salient acoustic dimensions: mean pitch, pitch excursion, breathiness, and formant spacing. Masculinity, older age, and dominance were conveyed by lowering mean pitch and formant spacing, consistent with projecting the impression of a large person, and flattening the intonation. Physical health, attractiveness, and trustworthiness were conveyed by choosing less extreme and more "typical" acoustic properties. A second perceptual experiment confirmed that the synthesised voices from Experiment 1 indeed conveyed the intended person characteristics to an independent sample of listeners, and that listeners relied on similar acoustic cues for their evaluations. From a methodological perspective, we draw similar conclusions from two drastically different approaches, thus providing a comprehensive account of first impression formation that bridges across voice production (or synthesis) and perception. Our converging findings furthermore demonstrate the methodological robustness of S4. From a theoretical perspective, our findings furthermore extend theoretical frameworks that conceptualise first impressions within a continuous “trait space”, highlighting the graded and intercorrelated nature of different person characteristics on a perceptual and conceptual level. We extend this framework by showing that not only perceptual judgments, but also the acoustic signatures of person characteristics show intercorrelations, thus integrating acoustic cues into perceptual models of voice perception.