The relationship between surprisal, prosody, and backchannels in conversation reflects intelligibility-oriented pressures
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Conversation is a dynamic, multi-modal activity involving the exchange of complex streams of information like words, prosody, gesture, eye contact, and backchannels. Understanding how these different channels interact in naturalistic scenarios is essential for understanding the mechanisms governing human communication.Past studies suggested that the duration of words is tied to their predictability in context, but it remains unclear whether this relationship is speaker-oriented (e.g. retrieval or production-based) or due to listener-oriented, intelligibility-based pressures (i.e. emphasizing unpredictable words to ease comprehension).This study aims to examine the relationship between predictability and additional streams of speaker and listener behavior, to test how much intelligibility-oriented principles impact conversation.We use the GPT-2 large language model to assess the relationship between surprisal, a measure of unpredictability, and several variables known to play an important role in conversation --- the prosodic features of duration, pitch, and intensity, and the timing of listener backchannels. We perform this analysis on the CANDOR corpus of naturalistic spoken video call conversation between strangers in English.In keeping with previous results using n-gram predictability, we find that GPT-2 surprisal predicts significantly higher values for duration. Moreover, surprisal also predicts maximum pitch and maximum intensity even when controlling for duration. Additionally, listener backchannels were more likely to overlap high-surprisal words compared to low-surprisal words, suggesting that listeners provide verbal feedback and acknowledgement of unpredictable, i.e. informative, words.The results provide additional support for intelligibility-based accounts, which hold that language production is sensitive to a pressure for successful communication, not just speaker-oriented pressures.