A Noise-Robust End-to-End Framework for Amharic Speech Recognition

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

End-to-end automatic speech recognition (ASR) offers a streamlined alternative to traditional systems that rely on multiple, separately trained language, acoustic, and pronunciation models. In this paper, we present a noise-robust, end-to-end ASR framework tailored specifically to the Amharic language. Our approach integrates a convolutional neural network (CNN), a recurrent neural network (RNN), and Connectionist Temporal Classification (CTC) to directly transcribe speech into text—bypassing the need for labor-intensive dictionary creation. We evaluate our method on a large corpus of 20,000 noisy Amharic utterances, achieving a word error rate (WER) of just 7%. This result underlines the effectiveness of our system in handling challenging acoustic conditions. By reducing complexity and manual overhead, our end-to-end model offers a practical and accurate solution for real-world deployments, with broader implications for developing ASR in other low-resource and noise-prone environments.

Article activity feed