Automating Code Generation for a New Ecosystem: Establishing Baselines with Large Language Model Based Code Generation for ArkTS and HarmonyOS

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Automated code generation has advanced significantly with large language models (LLMs), yet their performance in emerging domain-specific languages for practical software development remains largely untested. This gap is particularly critical for major, rapidly growing platforms like Huawei's HarmonyOS. Powering over 900 million devices, the HarmonyOS ecosystem is shifting to a native-only application environment with the HarmonyOS NEXT update. This shift makes development in its native ArkTS UI language essential, driving an urgent need for effective developer tools. Addressing this industry challenge, we conduct the first systematic evaluation of Large Language Models (LLMs) on their ability to generate valid ArkTS code to accelerate the software development lifecycle and assist developers in mastering this new language. To ground our evaluation in practical scenarios, we introduce two curated datasets: a test dataset, \textit{ArkTS-Test}, and a training dataset for fine-tuning, both derived from common UI development tasks in ArkTS. Using these datasets, we evaluate a diverse range of LLMs—from large-scale proprietary models (e.g., Claude, Gemini, DeepSeek-V3, Qwen3 Coder, from 70b to up to 671B total parameters) to smaller open-source models (7B–14B) and report our observations. Next, we propose a methodology called Iterative Compilation Feedback (ICF), which enables LLMs to autonomously correct their own code by leveraging compiler error messages. Our experiments show that ICF boosts the syntactic accuracy of large-scale LLMs to as high as 91\%. Furthermore, we show that fine-tuning a small-scale LLM (GPT-4o-mini) and combining it with our ICF method yields results comparable to the best-performing large-scale models. Finally, we conclude with a detailed categorization of compilation errors, identifying which types our ICF method resolves most effectively and which persist due to model knowledge limitations.

Article activity feed