BnVITS: A Voice Cloning Approach for Single Speaker Text-to-Speech

Udoy Das
Md. Saiful Islam
Hasan Murad
Muhammad Ibrahim Khan
Mehadi Hasan Menon
Tareq Muntasir

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Although significant progress has been made in voice cloning and text-to-speech (TTS) models, especially in generating natural-sounding speech, low-resource languages such as Bangla (Bn) and other languages remain nearly unexplored. Despite recent advancements, TTS systems for the Bangla language still encounter difficulties due to the intricate phonology and morphology. Furthermore, no previous work has been done on voice cloning for Bangla. To address the research gap, we provide a voice cloning method that uses the limited amount of speech data possible to build a TTS system for Bangla. Additionally, we introduce PYBANGLA, a text normalization tool created especially for Bangla language processing. Voice cloning can be accomplished by honing the top-performing TTS models with just a few target speaker samples. Both subjective and objective evaluation metrics have been conducted to assess the system, and the results show that our BnVITS model performs better than the earlier Bangla TTS model. This approach opens up new opportunities for individualized voice technology by paving the road for more efficient Bangla TTS approaches in terms of speech data.

Version published to 10.21203/rs.3.rs-6530449/v1 on Research Square
May 13, 2025

Comparative Analysis of Vosk Toolkit and Other Speech Recognition Frameworks for Custom Language Model Implementation

This article has 2 authors:
1. Owen Graham
2. Matt Percy
This article has no evaluationsLatest version May 9, 2025
Spoofing-robust speaker verification based on time-domain embedding

This article has 3 authors:
1. Avishai Weizman
2. Yehuda Ben-Shimol
3. Itshak Lapidot
This article has no evaluationsLatest version May 11, 2025
Hate Speech Detection in Roman Urdu English Tweets Through Data Pre-processing

This article has 3 authors:
1. Muhammad Asif Khan
2. Jazib e nazar
3. GuohHua Liu
This article has no evaluationsLatest version May 7, 2025

Listed in

Abstract

Article activity feed

Related articles

Comparative Analysis of Vosk Toolkit and Other Speech Recognition Frameworks for Custom Language Model Implementation

Spoofing-robust speaker verification based on time-domain embedding

Hate Speech Detection in Roman Urdu English Tweets Through Data Pre-processing