Vibe Coding in Vernacular Contexts: A Comprehensive Study on Tamil and Global Implications for Multilingual Programming Education
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large Language Models (LLMs) such as GPT-4, Claude, LLaMA, and PaLM have demonstrated remarkable performance in code generation, debugging, and problem-solving tasks. However, virtually all existing benchmarks and evaluation frameworks operate under the assumption of English-dominant prompts and interactions. This linguistic bias raises profound questions about accessibility and equity for the billions of learners worldwide who operate primarily in vernacular languages or bilingual contexts. This paper presents a comprehensive investigation into how state-of-the-art LLMs interpret, generate, and adapt programming code when task descriptions, constraints, and stylistic preferences are expressed in Tamil, a representative Dravidian language spoken by over 75 million people globally. Through systematic evaluation of multilingual LLMs across diverse programming scenarios, we uncover both promising capabilities and critical limitations that directly impact non-native English speakers' learning experiences. Our findings reveal systematic strengths in keyword recognition and basic algorithmic logic translation, alongside concerning weaknesses including semantic drift in complex explanations, code-comment language inconsistencies, and tokenization challenges in mixed-script environments. These results have immediate implications for the 1.5 billion students worldwide learning programming in non-English contexts. We position this study as a foundational step toward developing inclusive, multilingual benchmarks for programming education and advancing equitable AI-assisted learning in low-resource language environments.