Specializing Large Language Models for Process Modeling via Reinforcement Learning with Verifiable and Universal Rewards

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large Language Models (LLMs) pretrained on generic text often struggle to generate correct and behaviorally accurate process models. To address this limitation, we apply Reinforcement Learning (RL) to specialize a pretrained LLM specifically for the task of process modeling. Our RL approach combines automatically verifiable rewards, based on structural checks and behavioral footprints, with universal judgments provided by an LLM-as-a-Judge. We created a dataset of 1312 textual process descriptions with corresponding reference models to support Supervised Fine-Tuning and RL. Experiments demonstrate that RL significantly reduces invalid model generations, improves behavioral correctness, and allows control over model complexity. Evaluations on the ProMoAI benchmark confirm that our RL-trained checkpoint achieves performance close to state-of-the-art models, such as GPT-4o, while producing fewer invalid generations.

Article activity feed