Do LLMs Speak BPMN? An Evaluation of Their Process Modeling Capabilities Based on Quality Measures
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large Language Models (LLMs) are emerging as powerful tools for automating business process modeling, promising to streamline the translation of textual process descriptions into Business Process Model and Notation (BPMN) diagrams. However, the extent to which these Al systems can produce high-quality BPMN models has not yet been rigorously evaluated. This paper presents an early evaluation of five LLM-powered BPMN generation tools, that automatically convert textual process descriptions into BPMN models. To assess the quality of these Al-generated models, we introduce a novel structured evaluation framework that scores each BPMN diagram across three key process model quality dimensions: clarity/readability, correctness, and completeness, covering both accuracy and diagram understandability. Using this framework, we conducted experiments where each tool was tasked with modeling the same set of textual process scenarios, and the resulting diagrams were systematically scored based on the criteria. This approach provides a consistent and repeatable evaluation procedure and offers a new lens for comparing LLM-based modeling capabilities. Our findings reveal that while current LLM-based tools can produce BPMN diagrams that capture the main elements of a process description, they often exhibit errors such as missing steps, inconsistent logic, or modeling rule violations, highlighting limitations in achieving fully correct and complete models. The clarity and readability of the generated diagrams also vary, indicating that these Al models are still maturing in generating easily interpretable process flows. We conclude that although LLMs show promise in automating BPMN modeling, significant improvements are needed for them to consistently generate both syntactically and semantically valid process models.