Head-motion and eye-gaze behavior reveal audio-visual target search strategies

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Auditory and visual information help us identify and localize objects in space. While our visual system has a high spatial resolution in the fovea, its accuracy decreases in the periphery. Auditory localization is less accurate than visual foveal localization but effectively processes information from all directions. The auditory system has thus been argued to guide visual localization, such that it localizes the approximate location of the target (‘field-of-view localization’; FOV) whereas the visual system localizes the target within this area (‘target localization’). In the present study, we investigated how the auditory and visual systems contribute to these localization processes and how these localization strategies are affected by the complexity of the auditory scene. The complexity of the auditory scene was increased by adding auditory distractors, i.e., non-target auditory sources. Seven normal-hearing listeners participated in an audio-only, a visual-only and an audio-visual localization experiment where the number of auditory distractors (0, 1, 2, 3, 5, 7 or 11) was varied. The participants’ task was to localize the target as quickly as possible. Behavioral features, such as localization accuracy, response time, eye-gaze and head-motion, were tracked and analyzed. The results demonstrated that when the number of auditory distractors was below seven, the FOV localization time could be well described based on auditory perception alone. However, for seven or more distractors, audio-visual information was found to be beneficial for localization. In these conditions, audio-visual FOV localization times were smaller than those in the audio-only conditions. Furthermore, the target localization time was found to be consistently shorter in audio-visual conditions than in the visual-only and audio-only conditions. The head-motion data were similar in the audio-visual and audio-only conditions when the number of auditory distractors was low. However, as the number of distractors increased, the participants moved their heads more often in the wrong (non-target) direction, similar to the results obtained in the visual-only conditions. Overall, the data suggest that the interaction between auditory and visual processing is more complex than what would be expected based on the ‘auditory-guidance’ hypothesis. Instead, the human system adjusts its search strategy based on the complexity of the scene.

Article activity feed