Immersive Augmented Reality Music Interaction through Spatial Scene Understanding and Hand Gesture Recognition
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Augmented-reality (AR) music experiences have largely been confined to scripted interactions or fully virtual worlds, limiting the expressive potential that arises when performers engage directly with their physical surroundings. We present Scene-Aware Gesture-Driven Music (SAGeM), an AR application for Meta Quest~3 that unifies on-device scene understanding with real-time, six-degree-of-freedom hand tracking to support embodied, spontaneous musical creation. SAGeM continuously reconstructs a lightweight semantic mesh of a user’s environment and overlays dynamic audio affordances onto everyday surfaces. When a performer taps, punches, or claps near recognised objects, a low-latency gesture-classification pipeline triggers context-dependent percussion, allowing users to "play the room" as an instrument. In a formative study (n=8) participants reported high presence (mean SUS = 82.1) and described the experience as "making music with my own space." Quantitative profiling shows an average end-to-end latency of 19 ms (90th percentile = 24 ms) on-device. We discuss design lessons, remaining challenges for robust large-scale deployment, and future extensions toward adaptive soundscapes and cooperative performance.