Towards Speech Technology for Garo: A Low-Resource ASR System via Multilingual Transfer

Badal Nyalang
Kathy Biginchi Ch Momin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We present a fine-tuned Whisper model for automatic speech recognition (ASR) in Garo, a low-resource Tibeto-Burman language spoken in Northeast India. Using training samples from the Vaani dataset, we fine-tune Whisper-small and achieve a Word Error Rate (WER) of 9.74% and Character Error Rate (CER) of 3.82% on the test set, representing a 97.5% relative improvement over the zero-shot baseline. Our model produces perfect transcriptions for over 60% of test samples and achieves real-time inference speeds. We analyze error patterns including code-switching challenges and morphological complexities specific to Garo. The model is publicly released to support future research in low-resource speech recognition for Tibeto-Burman languages.

Version published to 10.20944/preprints202602.0686.v1
Feb 9, 2026

Reg2Bangla: An End-to-End Regional Speech Standardization

This article has 7 authors:
1. Samiul Basir Bhuiyan
2. Md Sazzad Hossain Adib
3. Mohammed Aman Bhuiyan
4. Aritra Islam Saswato
5. Ahmed Faizul Haque Dhrubo
6. Mohammad Ashrafuzzaman Khan
7. Mohammad Abdul Qayum
This article has no evaluationsLatest version Mar 17, 2026
Fine-Tuning Whisper for American English Air Traffic Control Speech Recognition: A Data-Efficient Pipeline

This article has 2 authors:
1. Jeffrey Su
2. Omar Haq
This article has no evaluationsLatest version Feb 26, 2026
End-to-End ASR Conformers: Revolutionizing Hearing-to-Speech-to-Writing Language Processing Frameworks

This article has 1 author:
1. R. Karthick
This article has no evaluationsLatest version Feb 26, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Reg2Bangla: An End-to-End Regional Speech Standardization

Fine-Tuning Whisper for American English Air Traffic Control Speech Recognition: A Data-Efficient Pipeline

End-to-End ASR Conformers: Revolutionizing Hearing-to-Speech-to-Writing Language Processing Frameworks