How Much Does the Dynamic F0 Curve Affect the Expression of Emotion in Utterances?

Tae-Jin Yoon

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The modulation of vocal elements such as pitch, loudness, and duration plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, challenges remain in accurately classifying emotions due to the complex and dynamic nature of vocal expressions. Traditional analytical methods often oversimplify these dynamics, potentially overlooking intricate patterns indicative of specific emotions. This study aims to enhance emotion classification in speech by directly incorporating dynamic F0 contours into the analytical framework using Generalized Additive Mixed Models (GAMMs). We utilized the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), focusing on eight distinct emotional states expressed by 24 professional actors. Sonorant segments were extracted, and F0 measurements were converted into semitones relative to a 100 Hz baseline to standardize pitch variations. By employing GAMMs, we modeled non-linear trajectories of F0 contours over time, accounting for both fixed effects (emotions) and random effects (individual speaker variability). Our analysis revealed that incorporating emotion-specific non-linear time effects and individual speaker differences significantly improved the model’s explanatory power, ultimately explaining up to 66.5% of the variance in F0. The inclusion of random smooths for time within speakers captured individual temporal modulation patterns, providing a more accurate representation of emotional speech dynamics. The results demonstrate that dynamic modeling of F0 contours using GAMMs enhances the accuracy of emotion classification in speech. This approach captures the nuanced pitch patterns associated with different emotions and accounts for individual variability among speakers. The findings contribute to a deeper understanding of the vocal expression of emotions and offer valuable insights for advancing speech emotion recognition systems.

Version published to 10.20944/preprints202409.2449.v1
Oct 1, 2024

Semantic content outperforms speech prosody in predicting affective experience in naturalistic settings

This article has 8 authors:
1. Timo Kevin Koch
2. Gabriella M. Harari
3. Ramona Schoedel
4. Samuel D. Gosling
5. Zachariah Marrero
6. Florian Bemmann
7. Markus Bühner
8. Clemens Stachl
This article has no evaluationsLatest version Oct 1, 2024
The Dynamic Posed Emotional Crying Behavior Database (DPECBD): A Comprehensive Resource to Study the Multifaceted Nature of Emotional Crying

This article has 3 authors:
1. Monika Wróbel
2. Janis Zickfeld
3. Paweł Ciesielski
This article has no evaluationsLatest version Nov 3, 2024
PLGLM: Emotion Recognition in Conversation based on Prompt Learning and Global-Local speaker Modeling

This article has 3 authors:
1. Bengong Yu
2. Menglu Shao
3. Zhonghao Xi
This article has no evaluationsLatest version Oct 29, 2024

Listed in

Abstract

Article activity feed

Related articles

Semantic content outperforms speech prosody in predicting affective experience in naturalistic settings

The Dynamic Posed Emotional Crying Behavior Database (DPECBD): A Comprehensive Resource to Study the Multifaceted Nature of Emotional Crying

PLGLM: Emotion Recognition in Conversation based on Prompt Learning and Global-Local speaker Modeling