Towards Speech Technology for Garo: A Low-Resource ASR System via Multilingual Transfer

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We present a fine-tuned Whisper model for automatic speech recognition (ASR) in Garo, a low-resource Tibeto-Burman language spoken in Northeast India. Using training samples from the Vaani dataset, we fine-tune Whisper-small and achieve a Word Error Rate (WER) of 9.74% and Character Error Rate (CER) of 3.82% on the test set, representing a 97.5% relative improvement over the zero-shot baseline. Our model produces perfect transcriptions for over 60% of test samples and achieves real-time inference speeds. We analyze error patterns including code-switching challenges and morphological complexities specific to Garo. The model is publicly released to support future research in low-resource speech recognition for Tibeto-Burman languages.

Article activity feed