Advancing Audio-Visual Attention Analysis in 360° Videos Through Real-Time Visualization
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study presents an open-source, real-time visualization tool designed to analyse audio-visual attention in 360° video environments under varying sound conditions. Traditional methods, such as static saliency maps and post-hoc analyses, often fail to capture the dynamic and participant-specific nature of attention shifts in immersive environments. To address these limitations, the proposed tool dynamically integrates head pose fixation maps with sound intensity heatmaps, enabling real-time tracking of attention patterns across different audio conditions, including No Sound (NS), Stereo (ST), First-Order Ambisonics (FO), and Third-Order Ambisonics (HO). Attention shifts across sound conditions were quantified using the Jaccard Index, which measures the overlap of the top 5% most-viewed regions across participants. The results demonstrate that increasing auditory complexity—from silence to spatial audio—significantly broadens visual exploration. First-Order Ambisonics (FO) led to the most dispersed attention patterns, with a 62.4% reduction in attention overlap indoors and 58.8% outdoors compared to NS. Third-Order Ambisonics (HO) resulted in a 61.2% reduction indoors and 52.0% outdoors, suggesting that while FO encourages broader exploration, HO facilitates a more focused distribution of attention. Notably, HO conditions led to a 3.2% increase indoors and a 16.6% increase outdoors in attention overlap compared to FO, indicating that higher-order spatial audio helps guide attention more precisely in complex environments. Unlike conventional approaches, which rely on static analyses, this tool provides real-time, participant-specific insights into attention shifts, offering a dynamic perspective on how spatial audio influences exploration. These capabilities empower VR content creators and researchers with actionable insights, optimizing spatial audio design and enhancing user engagement. By offering a robust and adaptable framework, this study advances the understanding of audio-visual interactions in immersive media environments.