A Spectrogram and Local Feature-Assisted Convolutional Neural Network for Amharic Speech Emotion Identification

Yeshambel Asmare Mengist
Abrham Debasu Mengistu
Mulatu Yirga Beyene
Mikiyas Assefa Kassa
Getasew Asmare Mengist

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Speech Emotion Recognition (SER) plays a significant role in improving human-computer interaction and human-human communication. Nevertheless, speech emotion recognition in low-resource languages like Amharic is still a difficult task because of the lack of datasets and language diversity. In this paper, a Convolutional Neural Network (CNN)-based approach, which combines spectrogram features and local acoustic features such as Mel-Frequency Cepstral Coefficients (MFCCs), chroma, zero-crossing rate (ZCR), energy, and pitch, is proposed for efficient Amharic speech emotion recognition. A dataset of 1,650 three-second Amharic speech samples was created, and the samples were labeled with five emotional classes: anger, fear, happy, neutral, and sad. Advanced preprocessing methods such as spectral subtraction and wavelet denoising were used to improve the quality of the signals and speed up the training process. The experimental results show that the proposed CNN-based approach has a classification accuracy of 90 percent, which is better than the recurrent neural network-based approaches: Long Short-Term Memory (LSTM) with 58.48 percent, Bidirectional Long Short-Term Memory (BiLSTM) with 63.33 percent, and Gated Recurrent Unit (GRU) with 40 percent, as well as the single-feature models: local acoustic features with 73 percent and spectrogram features with 79 percent. These results confirm that the integration of spectrogram and local acoustic features within a CNN architecture improves accuracy and efficiency in speech emotion recognition in low-resource languages, setting a standard for future Amharic SER research.

Version published to 10.21203/rs.3.rs-8961140/v1 on Research Square
Mar 25, 2026

Emotion Recognition Via Deep Learning Based Fuzzy CapsuleNet

This article has 3 authors:
1. Thilagavathy A
2. Lalitha S.D
3. Kannamma R
This article has no evaluationsLatest version Apr 6, 2026
Artificial emotional introspection improves learning for facial emotion recognition

This article has 1 author:
1. Kuzma Strelnikov
This article has no evaluationsLatest version Mar 19, 2026
Inner Speech Classification Using Deep Learning Techniques for EEG-Based Brain-Computer Interfaces

This article has 5 authors:
1. Nayeemulla Khan A.
2. Shahina A
3. Mythreya Kesavan
4. Mitul Krishna B.
5. Likhith Venkat P
This article has no evaluationsLatest version Apr 7, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Emotion Recognition Via Deep Learning Based Fuzzy CapsuleNet

Artificial emotional introspection improves learning for facial emotion recognition

Inner Speech Classification Using Deep Learning Techniques for EEG-Based Brain-Computer Interfaces