Voice deep fakes sound realistic but not (yet) hyperrealistic

Nadine Lavan
Mairi Irvine
Victor Rosi
Carolyn McGettigan

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

AI-generated voices are increasingly prevalent in our lives, via virtual assistants, automated customer service, and voice-overs. With increased availability and affordability of AI-generated voices, we need to examine how humans perceive them. Recently, an intriguing effect was reported in AI-generated faces, where such face images were perceived as more human than images of real humans - a “hyperrealism effect.” Here, we tested whether a “hyperrealism effect” also exists for AI-generated voices. We investigated the extent to which AI voices sound real to human listeners, and whether listeners can accurately distinguish between human and AI voices. We also examined perceived social trait characteristics (trustworthiness and dominance) of human and AI voices. We tested these questions using AI voices generated with and without a specific human counterpart (i.e., voice clones/deep fakes, and voices generated from the latent space of a large voice model).We find that deep fake voices can sound as real as human voices, making it difficult for listeners to distinguish between them. However, we did not observe a hyperrealism effect. Both types of AI-generated voices were evaluated as more dominant than human voices, with some AI voices also being perceived as more trustworthy.These findings raise questions for future research: Can hyperrealistic voices be created with more advanced technology, or is the lack of a hyperrealism effect due to differences between voice and face (image) perception? Our findings also highlight the potential for AI voices to misinform and fraud, alongside opportunities to use realistic AI-generated voices for beneficial purposes.

Version published to 10.31234/osf.io/jqg6e on OSF Preprints
Nov 25, 2024

“Eh? Aye!”: Categorisation bias for natural human vs AI-augmented voices is influenced by dialect.

This article has 1 author:
1. Neil William Kirk
This article has no evaluationsLatest version May 28, 2025
PERCEPTUAL EVALUATION OF SYNTHETIC VOICE DETECTION WITH DYSPHONIC SPEAKERS

This article has 1 author:
1. Eugenia San Segundo
This article has no evaluationsLatest version Jun 30, 2025
Finding the Human Voice in AI: Insights on the Perception of AI-Voice Clones from Naturalness and Similarity Ratings

This article has 9 authors:
1. Linda Bakkouche
2. Charles Mcghee
3. Emily Lau
4. Stephanie Cooper
5. Xinbing Luo
6. Madeleine Rees
7. Kai Alter
8. Brechtje Post
9. Julia Schwarz
This article has no evaluationsLatest version Jun 3, 2025

Listed in

Abstract

Article activity feed

Related articles

“Eh? Aye!”: Categorisation bias for natural human vs AI-augmented voices is influenced by dialect.

PERCEPTUAL EVALUATION OF SYNTHETIC VOICE DETECTION WITH DYSPHONIC SPEAKERS

Finding the Human Voice in AI: Insights on the Perception of AI-Voice Clones from Naturalness and Similarity Ratings