Voice clones sound realistic but not (yet) hyperrealistic
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
AI-generated voices are increasingly prevalent in our lives, via virtual assistants, automated customer service, and voice-overs. With increased availability and affordability of AI-generated voices, we need to examine how humans perceive them. Recently, an intriguing effect was reported in AI-generated faces, where such face images were perceived as more human than images of real humans - a “hyperrealism effect.” Here, we tested whether a “hyperrealism effect” also exists for AI-generated voices. We investigated the extent to which AI-generated voices sound real to human listeners, and whether listeners can accurately distinguish between human and AI-generated voices. We also examined perceived social trait characteristics (trustworthiness and dominance) of human and AI-generated voices. We tested these questions using AI-generated voices generated with and without a specific human counterpart (i.e., voice clones, and voices generated from the latent space of a large voice model).We find that voice clones can sound as real as human voices, making it difficult for listeners to distinguish between them. However, we did not observe a hyperrealism effect. Both types of AI-generated voices were evaluated as more dominant than human voices, with some AI-generated voices also being perceived as more trustworthy. These findings raise questions for future research: Can hyperrealistic voices be created with more advanced technology, or is the lack of a hyperrealism effect due to differences between voice and face (image) perception? Our findings also highlight the potential for AI-generated voices to misinform and fraud, alongside opportunities to use realistic AI-generated voices for beneficial purposes.