Cross-Cultural Speech Emotion Recognition for L2 Pronunciation Training

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Improving second language (L2) pronunciation is crucial for enhancing learners' communicative competence, especially in terms of emotional expression. While current L2 training systems focus on pronunciation accuracy, they often overlook emotional fluency, which is essential for natural communication. This study addresses this gap by integrating cross-cultural speech emotion recognition (SER) into L2 pronunciation training. Using multilingual emotion recognition models, we assess how non-native speakers express emotions in a target language and compare their emotional delivery to native speakers. The study employs speech emotion datasets across multiple languages and cultures to train and evaluate the model. The results show that non-native speakers often exhibit emotional expression discrepancies compared to native speakers, particularly in the tonal and prosodic features associated with emotions. The proposed model provides personalized feedback on both pronunciation accuracy and emotional expressiveness, offering a more holistic approach to L2 pronunciation training. This research demonstrates that incorporating emotional recognition into L2 learning systems can improve both linguistic and emotional fluency, making AI-driven language training tools more effective for learners. Ultimately, this work contributes to enhancing the quality of L2 education by addressing both pronunciation and emotional expression.

Article activity feed