Cross-Cultural Speech Emotion Recognition for L2 Pronunciation Training
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Improving second language (L2) pronunciation is crucial for enhancing learners' communicative competence, especially in terms of emotional expression. While current L2 training systems focus on pronunciation accuracy, they often overlook emotional fluency, which is essential for natural communication. This study addresses this gap by integrating cross-cultural speech emotion recognition (SER) into L2 pronunciation training. Using multilingual emotion recognition models, we assess how non-native speakers express emotions in a target language and compare their emotional delivery to native speakers. The study employs speech emotion datasets across multiple languages and cultures to train and evaluate the model. The results show that non-native speakers often exhibit emotional expression discrepancies compared to native speakers, particularly in the tonal and prosodic features associated with emotions. The proposed model provides personalized feedback on both pronunciation accuracy and emotional expressiveness, offering a more holistic approach to L2 pronunciation training. This research demonstrates that incorporating emotional recognition into L2 learning systems can improve both linguistic and emotional fluency, making AI-driven language training tools more effective for learners. Ultimately, this work contributes to enhancing the quality of L2 education by addressing both pronunciation and emotional expression.