A Hybrid Machine Translation Framework for Low-Resource Indian Languages Using Differential Programming Loss Optimization

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper proposes a hybrid machine translation (MT) framework for low-resource Indian languages by integrating an Iterative Data Merger (IDM), Synthetic Data Generation (SDG), and Differential Programming Loss Optimization (DPLO). The framework is evaluated on English→Bhojpuri and English→Punjabi translation tasks, with experiments conducted across legal, financial, and multidomain corpora. Results show that the proposed model consistently outperforms baseline systems and partial configurations, achieving improvements of up to + 2.87% BLEU, + 3.33% METEOR, and + 3.00% RIBES over the baseline. Domain-specific analysis reveals that financial texts yield higher translation quality compared to legal texts due to reduced terminological complexity, while cross-lingual comparisons demonstrate that Bhojpuri benefits more from resource availability and script alignment with Hindi than Punjabi. Ablation studies confirm the complementary impact of IDM, SDG, and DPLO, with the full model delivering the strongest overall performance. These findings highlight the effectiveness of the proposed approach for domain-adapted translation in low-resource settings and underscore its potential for scaling to other Indian languages.

Article activity feed