A BLEU-Based Comparative Analysis of Human and ChatGPT 4.0 Translation in Kumpulan Lagu dan Cerita Anak- anak Dwibahasa
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study aims to compare the translation quality of human translators and ChatGPT 4.0 using the Bilingual Evaluation Understudy (BLEU) metric, focusing on twelve stories from Kumpulan Lagu dan Cerita Anak-Anak Dwibahasa. The purpose of this research is to examine how closely ChatGPT 4.0’s translations align with human translations in terms of lexical and structural similarity. The methodology includes four main stages: preparing human and machine translation outputs, performing tokenization, calculating n-gram precision, and computing the final BLEU scores based on geometric means and brevity penalties. The findings reveal that ChatGPT 4.0 consistently produced translations that were longer and more stylistically elaborated than the human references, resulting in BLEU scores ranging from 0.4859 to 0.9068. These results indicate that although ChatGPT 4.0 can generate fluent and contextually appropriate translations, its outputs do not closely match human translations at the n-gram level. The study concludes that BLEU remains effective for measuring surface-level similarity but is limited in capturing stylistic and interpretive aspects of AI-generated translation in children’s literature.