Reg2Bangla: An End-to-End Regional Speech Standardization
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper presents a complete approach for transcribing twenty regional Bangladeshi dialects into standard Bangla text. We fine tuned the tugstugi bengaliai regional asr whisper medium model, which is a fine tuned variant of the Whisper model from OpenAI trained on external datasets. After this, we further fine tuned the model on a corpus of three thousand three hundred fifty dialectal audio recordings using the annotated label texts in the training set. We also applied post processing with an n gram KenLM language model and used KV cache optimization to achieve faster inference for the audio input. The proposed methodology tackles major challenges that arise from regional variation, pronunciation differences, and vocabulary shifts across linguistic communities in Bangladesh. The proposed pipeline shows strong performance on the evaluation set and demonstrates that a combination of pretrained multilingual speech models and targeted fine tuning, supported by post-processing techniques, can effectively manage the complexity of Bangladeshi dialectal speech.