Automating Code Generation for a New Ecosystem: Establishing Baselines with Large Language Model Based Code Generation for ArkTS and HarmonyOS

Mehmet Cem Aytekin
Fatma Gizem Calli
Mustafa Umut Demirezen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Automated code generation has advanced significantly with large language models (LLMs), yet their performance in emerging domain-specific languages for practical software development remains largely untested. This gap is particularly critical for major, rapidly growing platforms like Huawei's HarmonyOS. Powering over 900 million devices, the HarmonyOS ecosystem is shifting to a native-only application environment with the HarmonyOS NEXT update. This shift makes development in its native ArkTS UI language essential, driving an urgent need for effective developer tools. Addressing this industry challenge, we conduct the first systematic evaluation of Large Language Models (LLMs) on their ability to generate valid ArkTS code to accelerate the software development lifecycle and assist developers in mastering this new language. To ground our evaluation in practical scenarios, we introduce two curated datasets: a test dataset, \textit{ArkTS-Test}, and a training dataset for fine-tuning, both derived from common UI development tasks in ArkTS. Using these datasets, we evaluate a diverse range of LLMs—from large-scale proprietary models (e.g., Claude, Gemini, DeepSeek-V3, Qwen3 Coder, from 70b to up to 671B total parameters) to smaller open-source models (7B–14B) and report our observations. Next, we propose a methodology called Iterative Compilation Feedback (ICF), which enables LLMs to autonomously correct their own code by leveraging compiler error messages. Our experiments show that ICF boosts the syntactic accuracy of large-scale LLMs to as high as 91\%. Furthermore, we show that fine-tuning a small-scale LLM (GPT-4o-mini) and combining it with our ICF method yields results comparable to the best-performing large-scale models. Finally, we conclude with a detailed categorization of compilation errors, identifying which types our ICF method resolves most effectively and which persist due to model knowledge limitations.

Version published to 10.21203/rs.3.rs-7362986/v1 on Research Square
Sep 4, 2025

Integrating Large Language Models into Automated Software Testing

This article has 4 authors:
1. Yanet Sáez Iznaga
2. Luís Rato
3. Pedro Salgueiro
4. Javier Lamar León
This article has no evaluationsLatest version Sep 18, 2025
Survey and Benchmarking of Large Language Models for RTL Code Generation: Techniques and Open Challenges

This article has 4 authors:
1. Arun Ravindran
2. Aditya Patra
3. Vahid Babaey
4. Suresh Purini
This article has no evaluationsLatest version Sep 19, 2025
Automated Code Development for PDE Solvers Using Large Language Models

This article has 3 authors:
1. Haoyang Wu
2. Xinxin Zhang
3. Lailai Zhu
This article has no evaluationsLatest version Aug 18, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Integrating Large Language Models into Automated Software Testing

Survey and Benchmarking of Large Language Models for RTL Code Generation: Techniques and Open Challenges

Automated Code Development for PDE Solvers Using Large Language Models