A Student-Centric Evaluation Survey to Explore the Impact of LLMs on UML Modeling

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Unified Modeling Language (UML) diagrams serve as essential tools for visualizing system structure and behavior in software design. With the emergence of Large Language Models (LLMs) that automate various phases of software development, there is growing interest in leveraging these models for UML diagram generation. This study presents a comprehensive empirical investigation into the effectiveness of GPT-4-turbo in generating four fundamental UML diagram types: Class, Deployment, Use Case, and Sequence diagrams. We developed a novel rule-based prompt-engineering framework that transforms domain scenarios into optimized prompts for LLM processing. The generated diagrams were then synthesized using PlantUML and evaluated through a rigorous survey involving 121 computer science and software engineering students across three U.S. universities. Participants assessed both the completeness and correctness of LLM-assisted and human-created diagrams by examining specific elements within each diagram type. Statistical analyses, including paired t-tests, Wilcoxon signed-rank tests, and effect size calculations, validate the significance of our findings. The results reveal that while LLM-assisted diagrams achieve meaningful levels of completeness and correctness (ranging from 61.1% to 67.7%), they consistently underperform compared to human-created diagrams. The performance gap varies by diagram type, with Sequence diagrams showing the closest alignment to human quality and Use Case diagrams exhibiting the largest discrepancy. This research contributes a validated framework for evaluating LLM-generated UML diagrams and provides empirically-grounded insights into the current capabilities and limitations of LLMs in software modeling education.

Article activity feed