Perceptual adaptation and transfer of learning for noise‑vocoded cloned and human voices

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Voice cloning technology has developed at a rapid pace and current synthesis techniques produce near- humanlike voices. Recent research showed that cloned voices are up ~13% more intelligible than their human originals in background noise. This research also showed that intelligibility in cloned voices was driven mainly by pitch and harmonic measures, whereas formant- and vowel-space measures were more important for human voices. We aimed to establish whether the intelligibility benefit for cloned voices persisted when harmonic information and fine-grained spectral detail was (largely) absent from the speech signals and whether listeners perceptually adapted to cloned voices. We compared the intelligibility of ten six-band noise-vocoded cloned voices with their ten human originals. Eighty participants listened to 80 sentences, 40 human, 40 cloned in an online experiment. Listeners heard a block of 40 cloned voices and a block of 40 human voices (order counterbalanced) to compare perceptual adaptation to both speech types and evaluate transfer of learning effects. We found that noise-vocoded cloned voices were more intelligible than their human counterparts; listeners showed 13% percent higher accuracy scores. Overall, participants adapted to the vocoded speech and improved by ~8-10% over the course of 40 sentences in the first block only, this pattern held regardless of whether voices were human or cloned. We also found a positive transfer of learning effect between cloned and human voices; human voices were ~10% more intelligible when preceded by the cloned voices. These results indicate that the intelligibility benefit for cloned voices persists for noise-vocoded speech, and that listeners adapt to noise-vocoded cloned speech as they do for human speech. Our results have implications for applications of cloned voices, in particular in development of assistive technologies.

Article activity feed