Specializing Large Language Models for Process Modeling via Reinforcement Learning with Verifiable and Universal Rewards

Alessandro Berti
Xiaoting Wang
Humam Kourani
Wil M.P. van der Aalst

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large Language Models (LLMs) pretrained on generic text often struggle to generate correct and behaviorally accurate process models. To address this limitation, we apply Reinforcement Learning (RL) to specialize a pretrained LLM specifically for the task of process modeling. Our RL approach combines automatically verifiable rewards, based on structural checks and behavioral footprints, with universal judgments provided by an LLM-as-a-Judge. We created a dataset of 1312 textual process descriptions with corresponding reference models to support Supervised Fine-Tuning and RL. Experiments demonstrate that RL significantly reduces invalid model generations, improves behavioral correctness, and allows control over model complexity. Evaluations on the ProMoAI benchmark confirm that our RL-trained checkpoint achieves performance close to state-of-the-art models, such as GPT-4o, while producing fewer invalid generations.

Version published to 10.21203/rs.3.rs-7646566/v1 on Research Square
Oct 6, 2025

Improving Large Language Models with Concept-Aware Fine-Tuning

This article has 5 authors:
1. Dacheng Tao
2. Michael Chen
3. Xikun ZHANG
4. Jiaxing Huang
5. Yingjie Wang
This article has no evaluationsLatest version Oct 1, 2025
Research on adaptive course learning algorithmbased on reinforced feedback in English translationmodel optimization

This article has 2 authors:
1. Jie Zhang
2. Yanmei Geng
This article has no evaluationsLatest version Oct 15, 2025
CLGRPO: Reasoning Ability Enhancement for Small VLMs

This article has 7 authors:
1. Fanyi Wang
2. Bingzhi Dong
3. Weijie Zou
4. Haotian Hu
5. Jinjin Xu
6. Chongyang Wang
7. Zhiwang Zhang
This article has no evaluationsLatest version Oct 24, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Improving Large Language Models with Concept-Aware Fine-Tuning

Research on adaptive course learning algorithmbased on reinforced feedback in English translationmodel optimization

CLGRPO: Reasoning Ability Enhancement for Small VLMs