PERCEPTUAL EVALUATION OF SYNTHETIC VOICE DETECTION WITH DYSPHONIC SPEAKERS

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In the present work, we have designed a perceptual experiment comprising 80 stimuli: 40 samples of natural voices and 40 samples of their corresponding deepfakes. As for natural samples: 20 are from dysphonic patients and 20 are from a control group (half English and half Spanish for both groups). In the former group, we have 5 patients classified as mild-moderate and 5 as severe according to the CAPE-V scale for each language. The experiment involves listeners indicating, for each recording, whether it is a synthetic or human voice. Although some perceptual experiments have tested human performance in detecting synthetic voices, studies involving dysphonic voices are far less common. Our hypothesis is that dysphonic voices are more likely to be perceived as human voices than as deepfakes. In the same way that human faces are characterized by imperfections (e.g. wrinkles) and this allows distinguishing real images from visual deepfakes, human voices are often characterized by dysprosodic and dysphonic phenomena. The aim of this paper is therefore to shed light on new possible predictors of listener performance in perceptual experiments involving audio deepfake detection.

Article activity feed