Multimodal prosody in interaction: Scope, significance and trajectory

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper aims to provide a comprehensive account of one of the most fundamental aspects of human communication, viz. prosody. Prosody refers to the expressive elements of speech that shape interaction through acoustic cues such as fundamental frequency (pitch), intensity (loudness), and duration (length), and covers a variety of phenomena, inter alia stress, tone, intonation and rhythm. While often characterized as an acoustic phenomenon, prosody is equally expressed through visual channels, which has led some scholars to develop modality-neutral prosodic frameworks. Notwithstanding, compared to vocal prosody, visual prosody remains relatively underexplored, and does not have well-defined taxonomies, being instead characterized by kinesic cues such as head nods, manual beats, and eyebrow raising. Visual prosody shows parallels with vocal prosody in both form and function, and they both support diverse communicative functions. However, the field lacks a systematic investigation of multimodal prosody at the structural, operational, and neuronal levels. Multimodal prosody requires cross-disciplinary expertise, yet differences in nomenclature complicate research and delay implementation. Furthermore, language is often measured at a categorical level, while prosody exhibits continuous signal properties across auditory and visual domains. These variations are less concrete, posing greater challenges for observation and analysis than the tangible elements of language. At the same time, recent advances in prosody analysis, especially in the visual modality, allow researchers to revisit classic prosody questions with richer data, but the field lacks a roadmap. This makes the present moment especially timely for a comprehensive review. Such a review is needed to unify perspectives, assess emerging issues, and guide future research. By systematically examining how prosodic expressivity is built multimodally, this paper will not only consolidate scattered knowledge and establish a consistent prosodic ontology, but also provide an essential resource for a broad readership interested in communication, cognition, and social interaction.

Article activity feed