Perceptual adaptation and transfer of learning for noise‑vocoded cloned and human voices
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Voice cloning technology has developed at a rapid pace and current synthesis techniques produce near- humanlike voices. Recent research showed that cloned voices are up ~13% more intelligible than their human originals in background noise. This research also showed that intelligibility in cloned voices was driven mainly by pitch and harmonic measures, whereas formant- and vowel-space measures were more important for human voices. We aimed to establish whether the intelligibility benefit for cloned voices persisted when harmonic information and fine-grained spectral detail was (largely) absent from the speech signals and whether listeners perceptually adapted to cloned voices. We compared the intelligibility of ten six-band noise-vocoded cloned voices with their ten human originals. Eighty participants listened to 80 sentences, 40 human, 40 cloned in an online experiment. Listeners heard a block of 40 cloned voices and a block of 40 human voices (order counterbalanced) to compare perceptual adaptation to both speech types and evaluate transfer of learning effects. We found that noise-vocoded cloned voices were more intelligible than their human counterparts; listeners showed 13% percent higher accuracy scores. Overall, participants adapted to the vocoded speech and improved by ~8-10% over the course of 40 sentences in the first block only, this pattern held regardless of whether voices were human or cloned. We also found a positive transfer of learning effect between cloned and human voices; human voices were ~10% more intelligible when preceded by the cloned voices. These results indicate that the intelligibility benefit for cloned voices persists for noise-vocoded speech, and that listeners adapt to noise-vocoded cloned speech as they do for human speech. Our results have implications for applications of cloned voices, in particular in development of assistive technologies.