Students-Centric Evaluation Survey for Exploring the Impact of LLMs on UML Modeling

Bilal Al-Ahmad
Anas Alsobeh
Omar Meqdadi
Nazimuddin Shaikh
Md Faisal Kabir

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Unified Modeling Language (UML) diagrams are essential tools for visualizing system structure and behavior in software design. With the rise of using Large Language Models (LLMs) in automating various phases of software development, there is growing interest in automating UML diagrams generation. To that end, this study presents an empirical investigation into the effectiveness of LLMs, GPT-4-turbo, in generating structural (Class, Deployment) and behavioral (Use Case, Sequence) UML diagrams. A rule-based prompt engineering was developed to transform domain scenarios, extracted from a widely used UML textbook, into optimized prompts fed to LLMs. Then, UML diagrams were automatically synthesized using PlantUML, and evaluated through a survey of 121 computer science and software engineering students across three U.S. universities. Participants assessed completeness and correctness for both of LLM-assisted and Human-created diagrams by checking various elements for each single UML diagram. Statistical analyses, including paired t-tests, Wilcoxon signed-rank tests, and Pearson correlation, were conducted to validate the results. Findings revealed that LLM-assisted diagrams achieve completeness and correctness scores of 65%, 61.1% for Class diagram, 65.9%, 64.3% for Deployment diagram, 67.1%, 64.2% for Use Case diagram, and 67.7%, 66.2% for Sequence diagram. Whereas, the completeness and correctness for human-created diagrams reported as (79.8%, 76.3%), (70%, 73%), (80.7%, 80.4%), and (73.2%, 72.6%) for Class, Deployment, Use Case, and Sequence diagrams accordingly. Obviously, Class and Use Case diagrams show less similarity comparing to human-created models, while Deployment and Sequence diagrams show stronger alignment.

Version published to 10.20944/preprints202505.2054.v1
May 26, 2025

Large Language Models for C Test Case Generation: A Comparative Analysis

This article has 4 authors:
1. Alexandru Guzu
2. Georgian Nicolae
3. Horia Cucu
4. Corneliu Burileanu
This article has no evaluationsLatest version May 7, 2025
Intent-Driven Code Synthesis: Redefining Software Development with Transformers

This article has 2 authors:
1. Tejaswini Bollikonda
2. Monesh Kovi
This article has no evaluationsLatest version Apr 21, 2025
Testing the Potential: Are LLMs Valid and Reliable Tools for Analysing Academic Documents?

This article has 2 authors:
1. YUMING ZHANG
2. Linrui Zhong
This article has no evaluationsLatest version Apr 14, 2025

Listed in

Abstract

Article activity feed

Related articles

Large Language Models for C Test Case Generation: A Comparative Analysis

Intent-Driven Code Synthesis: Redefining Software Development with Transformers

Testing the Potential: Are LLMs Valid and Reliable Tools for Analysing Academic Documents?