A Hybrid Machine Translation Framework for Low-Resource Indian Languages Using Differential Programming Loss Optimization
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper proposes a hybrid machine translation (MT) framework for low-resource Indian languages by integrating an Iterative Data Merger (IDM), Synthetic Data Generation (SDG), and Differential Programming Loss Optimization (DPLO). The framework is evaluated on English→Bhojpuri and English→Punjabi translation tasks, with experiments conducted across legal, financial, and multidomain corpora. Results show that the proposed model consistently outperforms baseline systems and partial configurations, achieving improvements of up to + 2.87% BLEU, + 3.33% METEOR, and + 3.00% RIBES over the baseline. Domain-specific analysis reveals that financial texts yield higher translation quality compared to legal texts due to reduced terminological complexity, while cross-lingual comparisons demonstrate that Bhojpuri benefits more from resource availability and script alignment with Hindi than Punjabi. Ablation studies confirm the complementary impact of IDM, SDG, and DPLO, with the full model delivering the strongest overall performance. These findings highlight the effectiveness of the proposed approach for domain-adapted translation in low-resource settings and underscore its potential for scaling to other Indian languages.