BHASHABLEND: Bridging Transcription and Translation for Multilingual Video Content

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Translating video content into multiple languages is feasible with existing solutions but remains challenging. This work outlines a sophisticated advanced system that satisfies quality and accessibility improvements in multilingual video translation. The proposed method includes extracting audio from video, transcribing the audio using an innovative speech recognition model, and translating the transcribed text into various languages. The system uses Google’s Translation API and Text-to-Speech library, ensuring synchronization with the original video. The BhashaBlend model achieved a strong word error rate of 12.4%, significantly better than many major ASR systems: Google at 15.82%, and Microsoft at 16.51%. The model's performance was powerful for languages with the simplest phonetic realization, such as German, English, and Spanish, proving its dependability in delivering video dubbing. This highlights the potential of the model to produce results where excessive lingual complexity is involved and points towards the high applicability scope of BhashaBlend in language-polyvalent applications.

Article activity feed