Striking the Right Chord: How Vocal and Visual Cues Shape Learners’ Deep Cognitive Engagement in Video-Based Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
As video-based learning becomes ubiquitous, understanding how video cues impact learner engagement is critical. However, little is known about the nuanced, interactive effects of instructor’s auditory and visual signals. Drawing on the Emotions as Social Information (EASI) and Social Presence theories, this study investigates how vocal characteristics (pitch level, intensity level, pitch variability, intensity variability) exhibit non-linear relationships with learners’ deep cognitive engagement, and how instructor’s on-screen presence moderates these effects. By analyzing 40,742 observations from the major video platform Bilibili, we reveal that pitch level, pitch variability, and intensity variability demonstrate inverted U-shaped relationships with engagement, suggesting an optimal “sweet spot”. In contrast, intensity level shows a U-shaped relationship. Crucially, instructor presence amplifies all these vocal effects. Our study contributes to the cognitive engagement literature by providing a multimodal analysis of audio-visual interactions in online learning and offers actionable guidelines for creators to optimize their delivery on video platforms.