Neural Tracking of Audiovisual Effects in Noise Using Deep Neural Network-Generated Virtual Humans

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study investigates the effectiveness of Deep Neural Network (DNN)-generated virtual humans in enhancing audiovisual speech perception in noisy environments, with a focus on using neural measures to quantify these effects. Lip movements are essential for speech comprehension, especially when auditory cues are degraded by noise. Traditional recording methods produce high-quality audiovisual materials but are resource intensive. This research explores the use of DNN avatars as a promising alternative, utilizing a commercially available tool to create realistic virtual humans. The study included both simple sentences and a short story to improve ecological validity.

Eleven young, normal-hearing participants proficient in Flemish-Dutch listened to semantically meaningful sentences and a short story with various speaker types: a female FACS avatar, male and female DNN avatars, and a video of a human male speaker. The study included behavioral measures which consisted of an adaptive recall procedure and an adaptive rate procedure and electrophysiological measures consisting of neural tracking.

Findings in the adaptive recall procedure showed consistent audiovisual benefits, with the human speaker offering the greatest benefit (−4.75 dB SNR), followed by the DNN avatar (−4.00 dB SNR) and the FACS avatar (−1.55 dB SNR). Additionally in the adaptive rate procedure, the DNN avatar improved speech intelligibility, with average SRTs enhancing from −7.17 dB SNR (audio-only) to −9.02 dB SNR (audiovisual). The results from the neural tracking procedure indicated that most participants experienced audiovisual benefits, particularly in the −9 dB SNR range, revealing that audiovisual cues provided by DNN avatars can enhance speech perception, validating these avatars as effective tools for studying audiovisual effects when using both behavioral measures and electrophysiological measures.

Article activity feed